100+ datasets found
  1. h

    miners-detection

    • huggingface.co
    Updated Sep 20, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Training Data (2023). miners-detection [Dataset]. https://huggingface.co/datasets/TrainingDataPro/miners-detection
    Explore at:
    Dataset updated
    Sep 20, 2023
    Authors
    Training Data
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    The dataset consists of of photos captured within various mines, focusing on miners engaged in their work. Each photo is annotated with bounding box detection of the miners, an attribute highlights whether each miner is sitting or standing in the photo.

    The dataset's diverse applications such as computer vision, safety assessment and others make it a valuable resource for researchers, employers, and policymakers in the mining industry.

  2. Logs and Mined Sequential Patterns of Programming Processes from...

    • figshare.com
    txt
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Minji Kong; Lori Pollock (2023). Logs and Mined Sequential Patterns of Programming Processes from "Semi-Automatically Mining Students' Common Scratch Programming Behaviors" [Dataset]. http://doi.org/10.6084/m9.figshare.12100797.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    figshare
    Authors
    Minji Kong; Lori Pollock
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We present a ProgSnap2-based dataset containing anonymized logs of over 34,000 programming events exhibited by 81 programming students in Scratch, a visual programming environment, during our designed study as described in the paper "Semi-Automatically Mining Students' Common Scratch Programming Behaviors." We also include a list of approx. 3100 mined sequential patterns of programming processes that are performed by at least 10% of the 62 of the 81 students who are novice programmers, and represent maximal patterns generated by the MG-FSM algorithm while allowing a gap of one programming event. README.txt — overview of the dataset and its propertiesmainTable.csv — main event table of the dataset holding rows of programming eventscodeState.csv — table holding XML representations of code snapshots at the time of each programming eventdatasetMetadata.csv — describes features of the datasetScratch-SeqPatterns.txt — list of sequential patterns mined from the Main Event Table

  3. u

    Human-Computer Interaction Logs

    • indigo.uic.edu
    zip
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Julian Theis; Houshang Darabi (2023). Human-Computer Interaction Logs [Dataset]. http://doi.org/10.25417/uic.11923386.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    University of Illinois Chicago
    Authors
    Julian Theis; Houshang Darabi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset comprises ten human-computer interaction logs of real participants who solved a given task in a Windows environment. The participants were allowed to use the standard notepad, calculator, and file explorer. All recordings are anonymized and do not contain any private information.Simple:Each of the five log files in the folder simple contains Human-Computer Interaction recordings of a participant solving a simple task. Participants were provided 30 raw text files where each one contained data about the revenue and expenses of a single product for a given time period. In total 15 summaries were asked to be created by summarizing the data of two files and calculating the combined revenue, expenses, and profit. Complex:Each of the five log files in the folder complex contains Human-Computer Interaction recordings of a participant solving a more advanced task. In particular, participants were given a folder of text documents and were asked to create summary documents that contain the total revenue and expenses of the quarter, profit, and, where applicable, profit improvement compared to the previous quarter and the same quarter of the previous year. Each quarter’s data comprised multiple text files.The logging application that has been used is the one described inJulian Theis and Houshang Darabi. 2019. Behavioral Petri Net Mining and Automated Analysis for Human-Computer Interaction Recommendations in Multi-Application Environments. Proc. ACM Hum.-Comput. Interact. 3, EICS, Article 13 (June 2019), 16 pages. DOI: https://doi.org/10.1145/3331155Please refer to Table 1 and Table 2 of this publication regarding the structure of the log files. The first column corresponds to the timestamp in milliseconds, the second column represents the event key, and the third column contains additional event-specific information.

  4. u

    Public benchmark dataset for Conformance Checking in Process Mining

    • figshare.unimelb.edu.au
    • melbourne.figshare.com
    xml
    Updated Jan 30, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Reissner (2022). Public benchmark dataset for Conformance Checking in Process Mining [Dataset]. http://doi.org/10.26188/5cd91d0d3adaa
    Explore at:
    xmlAvailable download formats
    Dataset updated
    Jan 30, 2022
    Dataset provided by
    The University of Melbourne
    Authors
    Daniel Reissner
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains a variety of publicly available real-life event logs. We derived two types of Petri nets for each event log with two state-of-the-art process miners : Inductive Miner (IM) and Split Miner (SM). Each event log-Petri net pair is intended for evaluating the scalability of existing conformance checking techniques.We used this data-set to evaluate the scalability of the S-Component approach for measuring fitness. The dataset contains tables of descriptive statistics of both process models and event logs. In addition, this dataset includes the results in terms of time performance measured in milliseconds for several approaches for both multi-threaded and single-threaded executions. Last, the dataset contains a cost-comparison of different approaches and reports on the degree of over-approximation of the S-Components approach. The description of the compared conformance checking techniques can be found here: https://arxiv.org/abs/1910.09767. Update:The dataset has been extended with the event logs of the BPIC18 and BPIC19 logs. BPIC19 is actually a collection of four different processes and thus was split into four event logs. For each of the additional five event logs, again, two process models have been mined with inductive and split miner. We used the extended dataset to test the scalability of our tandem repeats approach for measuring fitness. The dataset now contains updated tables of log and model statistics as well as tables of the conducted experiments measuring execution time and raw fitness cost of various fitness approaches. The description of the compared conformance checking techniques can be found here: https://arxiv.org/abs/2004.01781.Update: The dataset has also been used to measure the scalability of a new Generalization measure based on concurrent and repetitive patterns. : A concurrency oracle is used in tandem with partial orders to identify concurrent patterns in the log that are tested against parallel blocks in the process model. Tandem repeats are used with various trace reduction and extensions to define repetitive patterns in the log that are tested against loops in the process model. Each pattern is assigned a partial fulfillment. The generalization is then the average of pattern fulfillments weighted by the trace counts for which the patterns have been observed. The dataset no includes the time results and a breakdown of Generalization values for the dataset.

  5. Market Basket Analysis

    • kaggle.com
    Updated Dec 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 9, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Aslan Ahmedov
    Description

    Market Basket Analysis

    Market basket analysis with Apriori algorithm

    The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

    Introduction

    Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

    An Example of Association Rules

    Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

    Strategy

    • Data Import
    • Data Understanding and Exploration
    • Transformation of the data – so that is ready to be consumed by the association rules algorithm
    • Running association rules
    • Exploring the rules generated
    • Filtering the generated rules
    • Visualization of Rule

    Dataset Description

    • File name: Assignment-1_Data
    • List name: retaildata
    • File format: . xlsx
    • Number of Row: 522065
    • Number of Attributes: 7

      • BillNo: 6-digit number assigned to each transaction. Nominal.
      • Itemname: Product name. Nominal.
      • Quantity: The quantities of each product per transaction. Numeric.
      • Date: The day and time when each transaction was generated. Numeric.
      • Price: Product price. Numeric.
      • CustomerID: 5-digit number assigned to each customer. Nominal.
      • Country: Name of the country where each customer resides. Nominal.

    imagehttps://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

    Libraries in R

    First, we need to load required libraries. Shortly I describe all libraries.

    • arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).
    • arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.
    • tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.
    • readxl - Read Excel Files in R.
    • plyr - Tools for Splitting, Applying and Combining Data.
    • ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
    • knitr - Dynamic Report generation in R.
    • magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.
    • dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.
    • tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

    imagehttps://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

    Data Pre-processing

    Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

    imagehttps://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> imagehttps://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

    After we will clear our data frame, will remove missing values.

    imagehttps://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

    To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...

  6. f

    The Maven Dependency Dataset

    • figshare.com
    • data.4tu.nl
    txt
    Updated Jul 23, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Steven Raemaekers; A. (Arie) van Deursen; Joost Visser (2020). The Maven Dependency Dataset [Dataset]. http://doi.org/10.4121/uuid:68a0e837-4fda-407a-949e-a159546e67b6
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jul 23, 2020
    Dataset provided by
    4TU.ResearchData
    Authors
    Steven Raemaekers; A. (Arie) van Deursen; Joost Visser
    License

    https://doi.org/10.4121/resource:terms_of_usehttps://doi.org/10.4121/resource:terms_of_use

    Description

    The Maven Dependency Dataset contains the data as described in the paper "Mining Metrics, Changes and Dependencies from the Maven Dependency Dataset". NOTE: See the README.TXT file for more information on the data in this dataset. The dataset consists of multiple parts: A snapshot of the Maven repository dated July 30, 2011 (maven.tar.gz), a MySQL database (complete.tar.gz) containing information on individual methods, classes and packages of different library versions, a Berkeley DB database (berkeley.tar.gz) containing metrics on all methods, classes and packages in the repository, a Neo4j graph database (graphdb.tar.gz) containing a call graph of the entire repository, scripts and analysis files (scriptsAndData.tar.gz), Source code and a binary package of the analysis software (fullmaven.jar and fullmaven-sources.jar), and text dumps of data in these databases (graphdump.tar.gz, processed.tar.gz, calls.tar.gz and units.tar.gz).

  7. d

    Data from: A Generic Local Algorithm for Mining Data Streams in Large...

    • catalog.data.gov
    • datasets.ai
    • +3more
    Updated Apr 10, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2025). A Generic Local Algorithm for Mining Data Streams in Large Distributed Systems [Dataset]. https://catalog.data.gov/dataset/a-generic-local-algorithm-for-mining-data-streams-in-large-distributed-systems
    Explore at:
    Dataset updated
    Apr 10, 2025
    Dataset provided by
    Dashlink
    Description

    In a large network of computers or wireless sensors, each of the components (henceforth, peers) has some data about the global state of the system. Much of the system's functionality such as message routing, information retrieval and load sharing relies on modeling the global state. We refer to the outcome of the function (e.g., the load experienced by each peer) as the emph{model} of the system. Since the state of the system is constantly changing, it is necessary to keep the models up-to-date. Computing global data mining models e.g. decision trees, k-means clustering in large distributed systems may be very costly due to the scale of the system and due to communication cost, which may be high. The cost further increases in a dynamic scenario when the data changes rapidly. In this paper we describe a two step approach for dealing with these costs. First, we describe a highly efficient emph{local} algorithm which can be used to monitor a wide class of data mining models. Then, we use this algorithm as a feedback loop for the monitoring of complex functions of the data such as its k-means clustering. The theoretical claims are corroborated with a thorough experimental analysis.

  8. Global ML-ready dataset for mining areas in satellite images

    • zenodo.org
    zip
    Updated Nov 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Simon Jasansky; Simon Jasansky; Victor Maus; Mirela Popa; Anna Wilbik; Anna Wilbik; Victor Maus; Mirela Popa (2024). Global ML-ready dataset for mining areas in satellite images [Dataset]. http://doi.org/10.5281/zenodo.14195737
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 21, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Simon Jasansky; Simon Jasansky; Victor Maus; Mirela Popa; Anna Wilbik; Anna Wilbik; Victor Maus; Mirela Popa
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This dataset is a global resource for machine learning applications in mining area detection and semantic segmentation on satellite imagery. It contains Sentinel-2 satellite images and corresponding mining area masks + bounding boxes for 1,210 sites worldwide. Ground-truth masks are derived from Maus et al. (2022) and Tang et al. (2023), and validated through manual verification to ensure accurate alignment with Sentinel-2 imagery from specific timestamps.

    The dataset includes three mask variants:

    • Masks exclusively from Maus et al. (n=1,090)
    • Masks exclusively from Tang et al. (n=817)
    • A preferred mask selected from either Maus or Tang based on alignment quality determined during manual review (n=1,210).

    Each tile corresponds to a 2048x2048 pixel Sentinel-2 image, with metadata on mine type (surface, placer, underground, brine & evaporation) and scale (artisanal, industrial). For convenience, the preferred mask dataset is already split into training (75%), validation (15%), and test (10%) sets.

    Furthermore, dataset quality was validated by re-validating test set tiles manually and correcting any mismatches between mining polygons and visually observed true mining area in the images, resulting in the following estimated quality metrics:

    CombinedMausTang
    Accuracy99.7899.7499.83
    Precision99.2299.2099.24
    Recall95.7196.3495.10

    Note that the dataset does not contain the Sentinel-2 images themselves but contains a reference to specific Sentinel-2 images. Thus, for any ML applications, the images must be persisted first. For example, Sentinel-2 imagery is available from Microsoft's Planetary Computer and filterable via STAC API: https://planetarycomputer.microsoft.com/dataset/sentinel-2-l2a. Additionally, the temporal specificity of the data allows integration with other imagery sources from the indicated timestamp, such as Landsat or other high-resolution imagery.

    Source code used to generate this dataset and to use it for ML model training is available at https://github.com/SimonJasansky/mine-segmentation. It includes useful Python scripts, e.g. to download Sentinel-2 images via STAC API, or to divide tile images (2048x2048px) into smaller chips (e.g. 512x512px).

    A database schema, a schematic depiction of the dataset generation process, and a map of the global distribution of tiles are provided in the accompanying images.

  9. d

    Data-Mining-Final-Project-Data

    • search.dataone.org
    Updated Sep 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anderson, Ty Julian (2024). Data-Mining-Final-Project-Data [Dataset]. http://doi.org/10.7910/DVN/8ETVW9
    Explore at:
    Dataset updated
    Sep 24, 2024
    Dataset provided by
    Harvard Dataverse
    Authors
    Anderson, Ty Julian
    Description

    Financial News Headlines. Visit https://dataone.org/datasets/sha256%3Ade01b1cf5318d53f0296b475ff28734d90acd6240a76f1eee1df39fefda07ef0 for complete metadata about this dataset.

  10. R

    Cv Cbi Mining Safety Dataset

    • universe.roboflow.com
    zip
    Updated May 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MININGSAFETYCBI (2024). Cv Cbi Mining Safety Dataset [Dataset]. https://universe.roboflow.com/miningsafetycbi/cv-cbi-mining-safety/dataset/3
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 29, 2024
    Dataset authored and provided by
    MININGSAFETYCBI
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Variables measured
    Boots Helms Vests Boots Bounding Boxes
    Description

    Project Description for Roboflow: Mining Safety - PPE Detection

    Project Name: Mining Safety - PPE Detection

    Overview:

    The Mining Safety - PPE Detection project aims to enhance safety protocols in mining environments by leveraging computer vision technology to detect Personal Protective Equipment (PPE). This project focuses on the detection of various PPE items and the absence of mandatory safety gear to ensure that workers adhere to safety regulations, thereby minimizing the risk of accidents and injuries.

    Objective:

    To develop a robust object detection model capable of accurately identifying 13 different classes of PPE in real-time using a dataset sourced from Roboflow Universe. The ultimate goal is to integrate this model into a monitoring system that can alert supervisors about non-compliance with PPE requirements in mining sites.

    PPE Classes (Labels):

    1. Goggles
    2. Helmet
    3. Mask
    4. No-Boots
    5. No-Gloves
    6. No-Helmet
    7. No-Mask
    8. No-Vest
    9. Undefined
    10. Vest
    11. Boots
    12. Ear-Protection
    13. Gloves

    Dataset:

    • Total Images: 7444
    • Source: Roboflow Universe
    • Annotations: Each image is annotated with bounding boxes corresponding to one or more of the 13 PPE classes.
    • Image Variety: The images come from various mining sites with different lighting conditions, camera angles, and worker positions to ensure diversity and robustness of the model.

    Project Steps:

    1. Data Collection and Annotation:

      • Import and utilize the dataset from Roboflow Universe, ensuring it covers diverse conditions and scenarios.
      • Verify and, if necessary, re-annotate images to match the 13 PPE classes accurately using the Roboflow platform.
    2. Data Preprocessing:

      • Perform data augmentation techniques such as rotation, scaling, and cropping to increase the variability and size of the dataset.
      • Split the dataset into training, validation, and test sets (e.g., 80% training, 10% validation, 10% test).
    3. Model Selection and Training:

      • Use a pre-trained YOLO (You Only Look Once) model due to its efficiency and accuracy in real-time object detection tasks.
      • Fine-tune the model on the annotated dataset using transfer learning to adapt it specifically to the mining safety PPE detection task.
    4. Model Evaluation:

      • Evaluate the model's performance using metrics such as precision, recall, F1-score, and mean Average Precision (mAP).
      • Conduct error analysis to identify common misclassifications and refine the model accordingly.
    5. Deployment:

      • Integrate the trained model into a real-time monitoring system.
      • Develop a user interface that displays video feeds and highlights detected PPE and any non-compliance issues.
      • Implement alert mechanisms to notify supervisors of any detected safety violations.
    6. Continuous Improvement:

      • Collect feedback from the deployment to continuously improve the model.
      • Regularly update the dataset with new images and retrain the model to maintain high accuracy.

    Expected Outcomes:

    • A high-accuracy object detection model capable of identifying and differentiating between 13 classes of PPE.
    • Enhanced safety monitoring system for mining sites, reducing the likelihood of accidents due to non-compliance with PPE regulations.
    • A scalable solution that can be adapted to other industrial environments requiring PPE detection.

    Tools and Technologies:

    • Annotation Tool: Roboflow
    • Object Detection Model: YOLO (preferably YOLOv8 or YOLOv9 for efficiency)
    • Programming Language: Python
    • Frameworks: PyTorch or TensorFlow for model training and inference
    • Deployment Platform: Docker for containerization and deployment on edge devices or cloud platforms
    • Monitoring and Alert System: Custom-built using Flask/Django (for web interface) and integrated with real-time notification services (e.g., Slack, email, SMS)

    This project will significantly contribute to improving the safety standards in mining operations by ensuring that all workers are consistently wearing the required protective gear.

  11. u

    The TAWOS dataset

    • rdr.ucl.ac.uk
    txt
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vali Tawosi; Afnan Abdulaziz A Alsubaihin; Rebecca Moussa; Federica Sarro (2023). The TAWOS dataset [Dataset]. http://doi.org/10.5522/04/19085834.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    University College London
    Authors
    Vali Tawosi; Afnan Abdulaziz A Alsubaihin; Rebecca Moussa; Federica Sarro
    License

    https://www.apache.org/licenses/LICENSE-2.0.htmlhttps://www.apache.org/licenses/LICENSE-2.0.html

    Description

    TAWOS (Peacock in Farsi and Arabic) is a dataset of agile open-source software project issues mined from Jira repositories including many descriptive features (raw and derived). The dataset aims to be all-inclusive, making it well-suited to several research avenues, and cross-analyses therein. This dataset is described and presented in the paper "A Versatile Dataset of Agile Open Source Software Projects" authored by Vali Tawosi, Afnan Al-Subaihin, Rebecca Moussa and Federica Sarro. The paper is accepted at the 2022 Mining Software Repositories (MSR) conference. Citation information will be available soon. For further information please refer to "https://github.com/SOLAR-group/TAWOS".

  12. Grocery Store dataset for data mining

    • kaggle.com
    zip
    Updated Mar 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Honey Patel (2021). Grocery Store dataset for data mining [Dataset]. https://www.kaggle.com/honeypatel2158/grocery-store-dataset-for-data-mining
    Explore at:
    zip(7990 bytes)Available download formats
    Dataset updated
    Mar 9, 2021
    Authors
    Honey Patel
    Description

    Dataset

    This dataset was created by Honey Patel

    Contents

  13. f

    MSR 2019 Mining Challenge Dataset

    • figshare.com
    zip
    Updated Mar 19, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Akond Rahman (2019). MSR 2019 Mining Challenge Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.6943304.v3
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 19, 2019
    Dataset provided by
    figshare
    Authors
    Akond Rahman
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset used for paper: Snakes in Paradise?: Insecure Python-related Coding Practices in Stack Overflow

  14. m

    Multidimensional Dataset Of Food Security And Nutrition In Cauca.

    • data.mendeley.com
    Updated Dec 6, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Santiago Restrepo (2021). Multidimensional Dataset Of Food Security And Nutrition In Cauca. [Dataset]. http://doi.org/10.17632/wsss65c885.1
    Explore at:
    Dataset updated
    Dec 6, 2021
    Authors
    David Santiago Restrepo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Cauca
    Description

    A multidimensional dataset created for the department of Cauca based on public data sources is published. The dataset integrates the 4 FAO food security dimensions: physical availability of food, economic and physical access to food, food utilization, and the sustainability of the dimensions mentioned above. It also allows analysis of different variables such as nutritional, socioeconomic, climatic, sociodemographic, among others with statistical techniques or temporal analysis. The dataset can also be used for analysis and extraction of characteristics with computer vision techniques from satellite images, or multimodal machine learning with data of a different nature (images and tabular data).

    The dataset Contains the folders: - Multidimensional dataset of Cauca/: Here are the tabular data of the municipalities of the department of Cauca. The folder contains the files: 1. dictionary(English).xlsx: The dictionary of the static variables for each municipality of Cauca in english. 2. dictionary(Español): The dictionary of the static variables for each municipality of Cauca in spanish. 3. dictionary(English).xlsx: The dictionary of the static variables for each municipality of Cauca in english. 4. MultidimensionalDataset_AllMunicipalities.csv: Nutritional, climatic, sociodemographic, socioeconomic and agricultural data of the 42 municipalities of the department of Cauca, although with some null values due to the lack of data in nutrition surveys of some municipalities. - Satellite Images Popayán/: Here are the monthly Landsat 8 satellite images of the municipality of Popayán in Cauca. The folder contains the folders: 1. RGB/: Contains the RGB images of the municipality of Popayán in the department of Cauca. It contains RGB images of Popayán from April 2013 to December 2020 in a resolution of 15 m / px. The title of each image is image year_month.png. 1. 6 Band Images/: Contains images of Landsat 8 using bands 1 to 8 to generate images of the municipality of Popayán in the department of Cauca. It contains 6 band images in a tif format of Popayán from April 2013 to December 2020 in a resolution of 15 m / px. The title of each image is image year_month.tif.

  15. P

    Dataset for Erasable Itemset Mining

    • opendata.pku.edu.cn
    Updated Nov 19, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peking University Open Research Data Platform (2015). Dataset for Erasable Itemset Mining [Dataset]. http://doi.org/10.18170/DVN/ISHFQX
    Explore at:
    text/plain; charset=us-ascii(5336007), text/plain; charset=us-ascii(9764947), text/plain; charset=us-ascii(7000387)Available download formats
    Dataset updated
    Nov 19, 2015
    Dataset provided by
    Peking University Open Research Data Platform
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    These three artificial datasets are for mining erasable itemset. The definition of erasable itemset is in the following reference papers. Note that the three data sets all include 200 different items. But for each item, we did not give the profit value of it. Users can generate as they require, with normal or randomly distribution.

  16. m

    Data Buffalo Toraja

    • data.mendeley.com
    Updated May 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abdul Rachman Manga (2025). Data Buffalo Toraja [Dataset]. http://doi.org/10.17632/kbft73pdkw.2
    Explore at:
    Dataset updated
    May 16, 2025
    Authors
    Abdul Rachman Manga
    License

    Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
    License information was derived automatically

    Description

    This data was taken directly in the Toraja area using a digital camera, a minimum shooting distance of 3 m in video form, the results of the shooting are divided into frames

  17. Datasets(Original, Mean, Median, Most Frequent).zip

    • figshare.com
    zip
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Omar Elzeki (2023). Datasets(Original, Mean, Median, Most Frequent).zip [Dataset]. http://doi.org/10.6084/m9.figshare.8118710.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Omar Elzeki
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset is transformed into Matlab format. They are designed to be in cell formats. Each cell is a matrix which consists of a column representing the gene and row for the subject.Each dataset is organized in a separate directory. The directory contains four versions: a) Original dataset, b) Imputed dataset by MEAN,c) Imputed dataset by MEDIAN,d) Imputed dataset by Most Frequent,

  18. m

    Balinese Story Texts Dataset - Characters, Aliases, and their Classification...

    • data.mendeley.com
    Updated Mar 15, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    I Made Satria Bimantara (2024). Balinese Story Texts Dataset - Characters, Aliases, and their Classification [Dataset]. http://doi.org/10.17632/h2tf5ymcp9.2
    Explore at:
    Dataset updated
    Mar 15, 2024
    Authors
    I Made Satria Bimantara
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset consists of 120 Balinese story texts (as known as Satua Bali) which have been annotated for character analysis purposes, including character identification, alias clustering, and character classification into protagonist or antagonist. The labeling involved two Balinese native speakers who were fluent in understanding Balinese story texts. One of them is an expert in the fields of sociolinguistics and macrolinguistics. Reliability and level of agreement in the dataset are measured by Cohen's kappa coefficient, Jaccard similarity coefficient, and F1-score and all of them show almost perfect agreement values (>0,81). There are four main folders, each used for different character analysis purposes: 1. First Dataset (charsNamedEntity): 89,917 tokens annotated with five character named entity labels (ANM, ADJ, PNAME, GODS, OBJ) for character named entity recognition purpose 2. Second Dataset (charsExtraction): 6,634 annotated sentences for the purpose of character identification at the sentence level 3. Third Dataset (charsAliasClustering): 930 lists of character groups from 120 story texts for the purpose of alias clustering 4. Fourth Dataset (charsClassification): 848 lists of character groups that have been filtered into two groups (Protagonist and Antagonist)

  19. m

    Bias-Free Dataset of Food Delivery App Reviews with Data Poisoning Attacks

    • data.mendeley.com
    Updated Oct 13, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hyunggu Jung (2023). Bias-Free Dataset of Food Delivery App Reviews with Data Poisoning Attacks [Dataset]. http://doi.org/10.17632/rnyrpzyw3h.1
    Explore at:
    Dataset updated
    Oct 13, 2023
    Authors
    Hyunggu Jung
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset consists of reviews collected from restaurants on a Korean delivery app platform running a review event. A total of 128,668 reviews were collected from 136 restaurants by crawling reviews using the Selenium library in Python. The 136 chosen restaurants run review events which demand customers to write reviews with 5 stars and photos. So the annotation of data was done by considering 1) whether the review gives five-star ratings, and 2) whether the review contains photo(s).

  20. m

    OpenScience Slovenia document metadata dataset

    • data.mendeley.com
    • narcis.nl
    Updated Nov 5, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mladen Borovič (2019). OpenScience Slovenia document metadata dataset [Dataset]. http://doi.org/10.17632/7wh9xvvmgk.1
    Explore at:
    Dataset updated
    Nov 5, 2019
    Authors
    Mladen Borovič
    License

    Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
    License information was derived automatically

    Area covered
    Slovenia
    Description

    The OpenScience Slovenia metadata dataset contains metadata entries for Slovenian public domain academic documents which include undergraduate and postgraduate theses, research and professional articles, along with other academic document types. The data within the dataset was collected as a part of the establishment of the Slovenian Open-Access Infrastructure which defined a unified document collection process and cataloguing for universities in Slovenia within the infrastructure repositories. The data was collected from several already established but separate library systems in Slovenia and merged into a single metadata scheme using metadata deduplication and merging techniques. It consists of text and numerical fields, representing attributes that describe documents. These attributes include document titles, keywords, abstracts, typologies, authors, issue years and other identifiers such as URL and UDC. The potential of this dataset lies especially in text mining and text classification tasks and can also be used in development or benchmarking of content-based recommender systems on real-world data.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Training Data (2023). miners-detection [Dataset]. https://huggingface.co/datasets/TrainingDataPro/miners-detection

miners-detection

TrainingDataPro/miners-detection

Explore at:
98 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Sep 20, 2023
Authors
Training Data
License

Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically

Description

The dataset consists of of photos captured within various mines, focusing on miners engaged in their work. Each photo is annotated with bounding box detection of the miners, an attribute highlights whether each miner is sitting or standing in the photo.

The dataset's diverse applications such as computer vision, safety assessment and others make it a valuable resource for researchers, employers, and policymakers in the mining industry.

Search
Clear search
Close search
Google apps
Main menu