66 datasets found
  1. Data from: An open dataset for intelligent recognition and classification of...

    • springernature.figshare.com
    bin
    Updated Jul 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xuhui Zhang; Wenjuan Yang; Bing Ma; Yanqun Wang; Yujia Wu; Jianxin Yan; Yongwei Liu; Chao Zhang; Jicheng Wan; Yue Wang; Mengyao Huang; Yuyang Li; Dian Zhao (2024). An open dataset for intelligent recognition and classification of abnormal condition in longwall mining [Dataset]. http://doi.org/10.6084/m9.figshare.22654945.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    Jul 8, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Xuhui Zhang; Wenjuan Yang; Bing Ma; Yanqun Wang; Yujia Wu; Jianxin Yan; Yongwei Liu; Chao Zhang; Jicheng Wan; Yue Wang; Mengyao Huang; Yuyang Li; Dian Zhao
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This work developed image dataset of underground longwall mining face (DsLMF+), which consists of 138004 images with annotation 6 categories of mine personnel, hydraulic support guard plate, large coal, towline, miners’ behaviour and mine safety helmet. All the labels of dataset are publicly available in YOLO format and COCO format.The dataset aims to support further research and advancement of the intelligent identification and classification of abnormal conditions for underground mining.

  2. R

    Data Mining Dataset

    • universe.roboflow.com
    zip
    Updated Aug 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ilham project (2023). Data Mining Dataset [Dataset]. https://universe.roboflow.com/ilham-project/data-mining-n52lu/model/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 4, 2023
    Dataset authored and provided by
    ilham project
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Variables measured
    Uangrupiah Bounding Boxes
    Description

    Data Mining

    ## Overview
    
    Data Mining is a dataset for object detection tasks - it contains Uangrupiah annotations for 692 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [Public Domain license](https://creativecommons.org/licenses/Public Domain).
    
  3. R

    Data Mining Test Dataset

    • universe.roboflow.com
    zip
    Updated Oct 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ons (2025). Data Mining Test Dataset [Dataset]. https://universe.roboflow.com/ons-eykpy/data-mining-test-fjlw4/dataset/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 20, 2025
    Dataset authored and provided by
    ons
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Cars Damage Cars Bounding Boxes
    Description

    Data Mining Test

    ## Overview
    
    Data Mining Test is a dataset for object detection tasks - it contains Cars Damage Cars annotations for 382 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  4. E

    Dataset for training classifiers of comparative sentences

    • live.european-language-grid.eu
    csv
    Updated Apr 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Dataset for training classifiers of comparative sentences [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/7607
    Explore at:
    csvAvailable download formats
    Dataset updated
    Apr 19, 2024
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    As there was no large publicly available cross-domain dataset for comparative argument mining, we create one composed of sentences, potentially annotated with BETTER / WORSE markers (the first object is better / worse than the second object) or NONE (the sentence does not contain a comparison of the target objects). The BETTER sentences stand for a pro-argument in favor of the first compared object and WORSE-sentences represent a con-argument and favor the second object. We aim for minimizing dataset domain-specific biases in order to capture the nature of comparison and not the nature of the particular domains, thus decided to control the specificity of domains by the selection of comparison targets. We hypothesized and could confirm in preliminary experiments that comparison targets usually have a common hypernym (i.e., are instances of the same class), which we utilized for selection of the compared objects pairs. The most specific domain we choose, is computer science with comparison targets like programming languages, database products and technology standards such as Bluetooth or Ethernet. Many computer science concepts can be compared objectively (e.g., on transmission speed or suitability for certain applications). The objects for this domain were manually extracted from List of-articles at Wikipedia. In the annotation process, annotators were asked to only label sentences from this domain if they had some basic knowledge in computer science. The second, broader domain is brands. It contains objects of different types (e.g., cars, electronics, and food). As brands are present in everyday life, anyone should be able to label the majority of sentences containing well-known brands such as Coca-Cola or Mercedes. Again, targets for this domain were manually extracted from `List of''-articles at Wikipedia.The third domain is not restricted to any topic: random. For each of 24~randomly selected seed words 10 similar words were collected based on the distributional similarity API of JoBimText (http://www.jobimtext.org). Seed words created using randomlists.com: book, car, carpenter, cellphone, Christmas, coffee, cork, Florida, hamster, hiking, Hoover, Metallica, NBC, Netflix, ninja, pencil, salad, soccer, Starbucks, sword, Tolkien, wine, wood, XBox, Yale.Especially for brands and computer science, the resulting object lists were large (4493 in brands and 1339 in computer science). In a manual inspection, low-frequency and ambiguous objects were removed from all object lists (e.g., RAID (a hardware concept) and Unity (a game engine) are also regularly used nouns). The remaining objects were combined to pairs. For each object type (seed Wikipedia list page or the seed word), all possible combinations were created. These pairs were then used to find sentences containing both objects. The aforementioned approaches to selecting compared objects pairs tend minimize inclusion of the domain specific data, but do not solve the problem fully though. We keep open a question of extending dataset with diverse object pairs including abstract concepts for future work. As for the sentence mining, we used the publicly available index of dependency-parsed sentences from the Common Crawl corpus containing over 14 billion English sentences filtered for duplicates. This index was queried for sentences containing both objects of each pair. For 90% of the pairs, we also added comparative cue words (better, easier, faster, nicer, wiser, cooler, decent, safer, superior, solid, terrific, worse, harder, slower, poorly, uglier, poorer, lousy, nastier, inferior, mediocre) to the query in order to bias the selection towards comparisons but at the same time admit comparisons that do not contain any of the anticipated cues. This was necessary as a random sampling would have resulted in only a very tiny fraction of comparisons. Note that even sentences containing a cue word do not necessarily express a comparison between the desired targets (dog vs. cat: He's the best pet that you can get, better than a dog or cat.). It is thus especially crucial to enable a classifier to learn not to rely on the existence of clue words only (very likely in a random sample of sentences with very few comparisons). For our corpus, we keep pairs with at least 100 retrieved sentences.From all sentences of those pairs, 2500 for each category were randomly sampled as candidates for a crowdsourced annotation that we conducted on figure-eight.com in several small batches. Each sentence was annotated by at least five trusted workers. We ranked annotations by confidence, which is the figure-eight internal measure of combining annotator trust and voting, and discarded annotations with a confidence below 50%. Of all annotated items, 71% received unanimous votes and for over 85% at least 4 out of 5 workers agreed -- rendering the collection procedure aimed at ease of annotation successful.The final dataset contains 7199 sentences with 271 distinct object pairs. The majority of sentences (over 72%) are non-comparative despite biasing the selection with cue words; in 70% of the comparative sentences, the favored target is named first.You can browse though the data here: https://docs.google.com/spreadsheets/d/1U8i6EU9GUKmHdPnfwXEuBxi0h3aiRCLPRC-3c9ROiOE/edit?usp=sharing Full description of the dataset is available in the workshop paper at ACL 2019 conference. Please cite this paper if you use the data: Franzek, Mirco, Alexander Panchenko, and Chris Biemann. ""Categorization of Comparative Sentences for Argument Mining."" arXiv preprint arXiv:1809.06152 (2018).@inproceedings{franzek2018categorization, title={Categorization of Comparative Sentences for Argument Mining}, author={Panchenko, Alexander and Bondarenko, and Franzek, Mirco and Hagen, Matthias and Biemann, Chris}, booktitle={Proceedings of the 6th Workshop on Argument Mining at ACL'2019}, year={2019}, address={Florence, Italy}}

  5. Z

    ELKI Multi-View Clustering Data Sets Based on the Amsterdam Library of...

    • data.niaid.nih.gov
    • elki-project.github.io
    • +1more
    Updated May 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Schubert, Erich; Zimek, Arthur (2024). ELKI Multi-View Clustering Data Sets Based on the Amsterdam Library of Object Images (ALOI) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6355683
    Explore at:
    Dataset updated
    May 2, 2024
    Dataset provided by
    Ludwig-Maximilians-Universität München
    Authors
    Schubert, Erich; Zimek, Arthur
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    These data sets were originally created for the following publications:

    M. E. Houle, H.-P. Kriegel, P. Kröger, E. Schubert, A. Zimek Can Shared-Neighbor Distances Defeat the Curse of Dimensionality? In Proceedings of the 22nd International Conference on Scientific and Statistical Database Management (SSDBM), Heidelberg, Germany, 2010.

    H.-P. Kriegel, E. Schubert, A. Zimek Evaluation of Multiple Clustering Solutions In 2nd MultiClust Workshop: Discovering, Summarizing and Using Multiple Clusterings Held in Conjunction with ECML PKDD 2011, Athens, Greece, 2011.

    The outlier data set versions were introduced in:

    E. Schubert, R. Wojdanowski, A. Zimek, H.-P. Kriegel On Evaluation of Outlier Rankings and Outlier Scores In Proceedings of the 12th SIAM International Conference on Data Mining (SDM), Anaheim, CA, 2012.

    They are derived from the original image data available at https://aloi.science.uva.nl/

    The image acquisition process is documented in the original ALOI work: J. M. Geusebroek, G. J. Burghouts, and A. W. M. Smeulders, The Amsterdam library of object images, Int. J. Comput. Vision, 61(1), 103-112, January, 2005

    Additional information is available at: https://elki-project.github.io/datasets/multi_view

    The following views are currently available:

        Feature type
        Description
        Files
    
    
        Object number
        Sparse 1000 dimensional vectors that give the true object assignment
        objs.arff.gz
    
    
        RGB color histograms
        Standard RGB color histograms (uniform binning)
        aloi-8d.csv.gz aloi-27d.csv.gz aloi-64d.csv.gz aloi-125d.csv.gz aloi-216d.csv.gz aloi-343d.csv.gz aloi-512d.csv.gz aloi-729d.csv.gz aloi-1000d.csv.gz
    
    
        HSV color histograms
        Standard HSV/HSB color histograms in various binnings
        aloi-hsb-2x2x2.csv.gz aloi-hsb-3x3x3.csv.gz aloi-hsb-4x4x4.csv.gz aloi-hsb-5x5x5.csv.gz aloi-hsb-6x6x6.csv.gz aloi-hsb-7x7x7.csv.gz aloi-hsb-7x2x2.csv.gz aloi-hsb-7x3x3.csv.gz aloi-hsb-14x3x3.csv.gz aloi-hsb-8x4x4.csv.gz aloi-hsb-9x5x5.csv.gz aloi-hsb-13x4x4.csv.gz aloi-hsb-14x5x5.csv.gz aloi-hsb-10x6x6.csv.gz aloi-hsb-14x6x6.csv.gz
    
    
        Color similiarity
        Average similarity to 77 reference colors (not histograms) 18 colors x 2 sat x 2 bri + 5 grey values (incl. white, black)
        aloi-colorsim77.arff.gz (feature subsets are meaningful here, as these features are computed independently of each other)
    
    
        Haralick features
        First 13 Haralick features (radius 1 pixel)
        aloi-haralick-1.csv.gz
    
    
        Front to back
        Vectors representing front face vs. back faces of individual objects
        front.arff.gz
    
    
        Basic light
        Vectors indicating basic light situations
        light.arff.gz
    
    
        Manual annotations
        Manually annotated object groups of semantically related objects such as cups
        manual1.arff.gz
    

    Outlier Detection Versions

    Additionally, we generated a number of subsets for outlier detection:

        Feature type
        Description
        Files
    
    
        RGB Histograms
        Downsampled to 100000 objects (553 outliers)
        aloi-27d-100000-max10-tot553.csv.gz aloi-64d-100000-max10-tot553.csv.gz
    
    
    
        Downsampled to 75000 objects (717 outliers)
        aloi-27d-75000-max4-tot717.csv.gz aloi-64d-75000-max4-tot717.csv.gz
    
    
    
        Downsampled to 50000 objects (1508 outliers)
        aloi-27d-50000-max5-tot1508.csv.gz aloi-64d-50000-max5-tot1508.csv.gz
    
  6. Procure-To-Payment (P2P) Object-centric Event Log in OCEL 2.0 Standard

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    bin, json, xml
    Updated Oct 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gyunam Park; Gyunam Park; genannt Unterberg Leah Tacke; genannt Unterberg Leah Tacke (2023). Procure-To-Payment (P2P) Object-centric Event Log in OCEL 2.0 Standard [Dataset]. http://doi.org/10.5281/zenodo.8412920
    Explore at:
    json, xml, binAvailable download formats
    Dataset updated
    Oct 7, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Gyunam Park; Gyunam Park; genannt Unterberg Leah Tacke; genannt Unterberg Leah Tacke
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Short Description

    This process describes the Procure-To-Pay (P2P) procedure within an organization, starting from the initiation of a purchase requirement up to the execution of payment. This simulation extensively uses genuine SAP transactions and object types to offer a realistic representation of the P2P process.

    Overview

    Within our simulated organization:

    • Procurement Initiatives: The procurement journey begins when a department or individual recognizes a need and creates a Purchase Requisition using transaction ME51N.

    • Approval Process: Before the purchase can proceed, the requisition must be approved. This is carried out using transaction ME54N. Given the nature of our simulation, there may be instances where the approval process takes an unusually long time, exemplifying the Lengthy Approval Process behavior.

    • Vendor Interactions:

      • Upon approval, a Request for Quotation is sent out to potential vendors using transaction ME41.
      • Vendors then submit their quotations, which are maintained in the system using transaction ME47.
    • Purchase Order Creation: Once a vendor's quotation is selected, a Purchase Order is created using transaction ME21N. The purchase order is then subjected to an internal approval process (ME29N). Occasionally, maverick buying—where purchases are made without proper authorization—can be observed.

    • Goods & Invoice Management:

      • When the goods are received, a Goods Receipt is recorded using transaction MIGO.
      • Invoices from vendors are then received and recorded. A three-way match, which checks the purchase order, goods receipt, and invoice for discrepancies, is performed using transaction MRBR.
    • Payment: Once everything is verified, payments are executed using transaction F110. However, there may be instances of Duplicate Payments in our simulation, where the system mistakenly pays the same invoice more than once.

    Special Behaviors:

    • Maverick Buying: Unauthorized purchases, bypassing the standard procedure.
    • Duplicate Payments: An error leading to the same invoice being paid multiple times.
    • Lengthy Approval Process: Delays in approving purchase requisitions or purchase orders, which might lead to operational inefficiencies.

    General Properties

    An overview of log properties is given below.

    PropertyValue
    Event Types10
    Object Types7
    Events14671
    Objects9543

    Authors

    Gyunam Park and Leah Tacke genannt Unterberg

    Contributing

    To contribute, drop us an email! We are happy to receive your feedback.

  7. Atlanta, Georgia - Aerial imagery object identification dataset for building...

    • figshare.com
    tiff
    Updated Jun 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kyle Bradbury; Benjamin Brigman; Leslie Collins; Timothy Johnson; Sebastian Lin; Richard Newell; Sophia Park; Sunith Suresh; Hoel Wiesner; Yue Xi (2023). Atlanta, Georgia - Aerial imagery object identification dataset for building and road detection, and building height estimation [Dataset]. http://doi.org/10.6084/m9.figshare.3504308.v1
    Explore at:
    tiffAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Kyle Bradbury; Benjamin Brigman; Leslie Collins; Timothy Johnson; Sebastian Lin; Richard Newell; Sophia Park; Sunith Suresh; Hoel Wiesner; Yue Xi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Georgia, Atlanta
    Description

    This dataset is part of the larger data collection, “Aerial imagery object identification dataset for building and road detection, and building height estimation”, linked to in the references below and can be accessed here: https://dx.doi.org/10.6084/m9.figshare.c.3290519. For a full description of the data, please see the metadata: https://dx.doi.org/10.6084/m9.figshare.3504413.

    Imagery data from the United States Geological Survey (USGS); building and road shapefiles are from OpenStreetMaps (OSM) (these OSM data are made available under the Open Database License: http://opendatacommons.org/licenses/odbl/1.0/); and the Lidar data are from U.S. National Oceanic and Atmospheric Administration (NOAA), the Texas Natural Resources Information System (TNRIS).

  8. Z

    Repartition of part of visdrone2019 dataset

    • data.niaid.nih.gov
    Updated Nov 24, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gang Liu (2022). Repartition of part of visdrone2019 dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7355397
    Explore at:
    Dataset updated
    Nov 24, 2022
    Authors
    Gang Liu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The VisDrone2019 dataset is collected by the AISKYEYE team at the Lab of Machine Learning and Data Mining, Tianjin University, China. The dataset contains a large number of objects in urban and rural road scenes (10 categories such as pedestrians, vehicles, bicycles, etc.), covering a wide variety of scenes and containing a large number of small objects.A link to the original data set: https://github.com/VisDrone/VisDrone-Dataset We selected the training set of the object detection part as our data set, and randomly divided it into new training set, verification set and test set in a ratio close to 7:2:1. Available at https://github.com/VisDrone/VisDrone-Dataset

  9. Market Basket Analysis

    • kaggle.com
    zip
    Updated Dec 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
    Explore at:
    zip(23875170 bytes)Available download formats
    Dataset updated
    Dec 9, 2021
    Authors
    Aslan Ahmedov
    Description

    Market Basket Analysis

    Market basket analysis with Apriori algorithm

    The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

    Introduction

    Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

    An Example of Association Rules

    Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

    Strategy

    • Data Import
    • Data Understanding and Exploration
    • Transformation of the data – so that is ready to be consumed by the association rules algorithm
    • Running association rules
    • Exploring the rules generated
    • Filtering the generated rules
    • Visualization of Rule

    Dataset Description

    • File name: Assignment-1_Data
    • List name: retaildata
    • File format: . xlsx
    • Number of Row: 522065
    • Number of Attributes: 7

      • BillNo: 6-digit number assigned to each transaction. Nominal.
      • Itemname: Product name. Nominal.
      • Quantity: The quantities of each product per transaction. Numeric.
      • Date: The day and time when each transaction was generated. Numeric.
      • Price: Product price. Numeric.
      • CustomerID: 5-digit number assigned to each customer. Nominal.
      • Country: Name of the country where each customer resides. Nominal.

    imagehttps://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

    Libraries in R

    First, we need to load required libraries. Shortly I describe all libraries.

    • arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).
    • arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.
    • tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.
    • readxl - Read Excel Files in R.
    • plyr - Tools for Splitting, Applying and Combining Data.
    • ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
    • knitr - Dynamic Report generation in R.
    • magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.
    • dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.
    • tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

    imagehttps://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

    Data Pre-processing

    Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

    imagehttps://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> imagehttps://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

    After we will clear our data frame, will remove missing values.

    imagehttps://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

    To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...

  10. c

    Comprehensive Ethereum Execution Data for Object-Centric Process Mining of...

    • cryptodata.center
    Updated Dec 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Comprehensive Ethereum Execution Data for Object-Centric Process Mining of DApps - Dataset - CryptoData Hub [Dataset]. https://cryptodata.center/dataset/comprehensive-ethereum-execution-data-for-object-centric-process-mining-of-dapps
    Explore at:
    Dataset updated
    Dec 4, 2024
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset pertains to the collection and analysis of blockchain execution data, particularly from Ethereum-based Decentralized Applications (DApps). This data includes transactions, transaction receipts, and detailed transaction traces, documenting the execution steps performed by the Ethereum Virtual Machine (EVM). Such traces are essential for understanding the interaction between smart contracts and accounts, including Contract Accounts (CAs) and Externally Owned Accounts (EOAs). A blockchain is an append-only ledger that chronologically records data in blocks. Each block contains transactions that signify state transitions, and transaction receipts that provide a hashed result of these transitions to ensure uniform results across different executions. The dataset includes a classification of Ethereum accounts, detailing the functions and interactions between EOAs and CAs, where CAs deploy and execute smart contract code. The dataset captures the granular operational data of blockchain transactions, such as function calls, contract creations, and log entries generated by smart contracts. These details are crucial for creating object-centric event logs, aiding in process mining and analysis to bridge the gap between theoretical process models and actual execution. Contract creations and function calls are fundamental components of the dataset. The former documents the deployment of smart contracts, including the mechanics of contract updates and additions through various design patterns. Function calls between accounts are also extensively logged, providing insights into the flow of Ethereum's native token, Ether, and other transactional data within the blockchain. Delegated calls and log entries represent more specialized interactions within Ethereum, where delegated calls allow contracts to use code from other contracts to manipulate their own state, supporting upgradeable contract designs. Log entries, specified within smart contract code, facilitate the communication of contract execution details to external systems. To handle the diverse and dynamic nature of blockchain data, the dataset employs the Object-Centric Event Log (OCEL) format. This format accommodates multiple object types in a single log, addressing issues such as event divergence and convergence, typical of traditional single-case logs. The latest version, OCEL 2.0, supports documenting dynamic object roles and relationships, improving the fidelity of logs in capturing blockchain operations. In summary, the dataset is structured to support a comprehensive analysis of blockchain behaviors, particularly focusing on Ethereum DApps. It is tailored to assist researchers and practitioners in understanding and analyzing the decentralized execution of smart contracts and the associated data flows within the blockchain environment.

  11. VisDrone Dataset for Drone-Based Computer Vision

    • kaggle.com
    zip
    Updated Sep 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Evil Spirit05 (2024). VisDrone Dataset for Drone-Based Computer Vision [Dataset]. https://www.kaggle.com/datasets/evilspirit05/visdrone/data
    Explore at:
    zip(1990878150 bytes)Available download formats
    Dataset updated
    Sep 24, 2024
    Authors
    Evil Spirit05
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description
    The VisDrone Dataset is a comprehensive benchmark developed by the AISKYEYE team at the Lab of Machine Learning and Data Mining, Tianjin University, China. Designed for various computer vision tasks associated with drone-based image and video analysis, the dataset serves as an essential resource for researchers and practitioners in the field.
    

    Key Features

    • Extensive Collection: The dataset comprises 288 video clips containing 261,908 frames and 10,209 static images, all captured using different drone-mounted cameras. This extensive collection showcases a wide range of environments, objects, and scenarios.
    • Diverse Environments: VisDrone encompasses images and videos from 14 cities across China, covering both urban and rural settings. This diversity enhances the dataset's applicability to various real-world applications.
    • Varied Object Categories: The dataset features a rich array of object categories, including pedestrians, vehicles, bicycles, and tricycles. This variety allows for robust training and evaluation of models across multiple object detection tasks.
    • High-Quality Annotations: With over 2.6 million manually annotated bounding boxes, the VisDrone Dataset provides detailed ground truth data for object detection, tracking, and crowd counting tasks. Annotations also include attributes such as scene visibility, object class, and occlusion, enabling researchers to develop more effective models.

    Dataset Structure

    The VisDrone dataset is organized into five main subsets, each targeting a specific task:

    • Task 1: Object Detection in Images
    • Task 2: Object Detection in Videos
    • Task 3: Single-Object Tracking
    • Task 4: Multi-Object Tracking
    • Task 5: Crowd Counting This structured approach facilitates focused training and evaluation for distinct computer vision challenges.

    Applications

    The VisDrone Dataset is widely used for training and evaluating deep learning models in various drone-based computer vision tasks, including:
    
    • Object Detection: Identifying and localizing multiple object classes in images and videos.
    • Object Tracking: Following individual objects across frames in video sequences, enabling applications in surveillance and traffic monitoring.
    • Crowd Counting: Estimating the number of individuals in crowded scenes, which is valuable for urban planning and safety assessments.

    Conclusion

    The VisDrone Dataset stands out as a significant contribution to the field of drone-based computer vision. Its diverse sensor data, extensive annotations, and various task-focused subsets make it a valuable resource for advancing research and development in drone applications. Whether for academic research or practical implementations, the VisDrone Dataset is instrumental in fostering innovation in the rapidly evolving domain of drone technology.
    
  12. m

    Asbest veins in the open pit conditions

    • data.mendeley.com
    Updated Dec 12, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mikhail Ronkin (2022). Asbest veins in the open pit conditions [Dataset]. http://doi.org/10.17632/y2jfk63tpd.1
    Explore at:
    Dataset updated
    Dec 12, 2022
    Authors
    Mikhail Ronkin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Database includes 1660 images of asbestos rock-chunks with asbestos veins taken in the different weather and day time conditions. All Data taken in the Bazhenovskoye field, Russia. All data are labeled for instance segmentation (as well as object detection and semantic segmentation) problems and have labeling in the COCO format. The archive contains both: all data in the images folder and annotation in the annotations folder. The labeling was performed manually in the CVAT software. The image size is 2592 × 2048.

  13. Make Data Count Dataset - MinerU Extraction

    • kaggle.com
    zip
    Updated Aug 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Omid Erfanmanesh (2025). Make Data Count Dataset - MinerU Extraction [Dataset]. https://www.kaggle.com/datasets/omiderfanmanesh/make-data-count-dataset-mineru-extraction
    Explore at:
    zip(4272989320 bytes)Available download formats
    Dataset updated
    Aug 26, 2025
    Authors
    Omid Erfanmanesh
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Description

    This dataset contains PDF-to-text conversions of scientific research articles, prepared for the task of data citation mining. The goal is to identify references to research datasets within full-text scientific papers and classify them as Primary (data generated in the study) or Secondary (data reused from external sources).

    The PDF articles were processed using MinerU, which converts scientific PDFs into structured machine-readable formats (JSON, Markdown, images). This ensures participants can access both the raw text and layout information needed for fine-grained information extraction.

    Files and Structure

    Each paper directory contains the following files:

    • *_origin.pdf The original PDF file of the scientific article.

    • *_content_list.json Structured extraction of the PDF content, where each object represents a text or figure element with metadata. Example entry:

      {
       "type": "text",
       "text": "10.1002/2017JC013030",
       "text_level": 1,
       "page_idx": 0
      }
      
    • full.md The complete article content in Markdown format (linearized for easier reading).

    • images/ Folder containing figures and extracted images from the article.

    • layout.json Page layout metadata, including positions of text blocks and images.

    Data Mining Task

    The aim is to detect dataset references in the article text and classify them:

    Each dataset mention must be labeled as:

    • Primary: Data generated by the paper (new experiments, field observations, sequencing runs, etc.).
    • Secondary: Data reused from external repositories or prior studies.

    Training and Test Splits

    • train/ → Articles with gold-standard labels (train_labels.csv).
    • test/ → Articles without labels, used for evaluation.
    • train_labels.csv → Ground truth with:

      • article_id: Research paper DOI.
      • dataset_id: Extracted dataset identifier.
      • type: Citation type (Primary / Secondary).
    • sample_submission.csv → Example submission format.

    Example

    Paper: https://doi.org/10.1098/rspb.2016.1151 Data: https://doi.org/10.5061/dryad.6m3n9 In-text span:

    "The data we used in this publication can be accessed from Dryad at doi:10.5061/dryad.6m3n9." Citation type: Primary

    This dataset enables participants to develop and test NLP systems for:

    • Information extraction (locating dataset mentions).
    • Identifier normalization (mapping mentions to persistent IDs).
    • Citation classification (distinguishing Primary vs Secondary data usage).
  14. Zenodo Open Metadata snapshot - Training dataset for records and communities...

    • zenodo.org
    • data.niaid.nih.gov
    application/gzip, bin
    Updated Dec 15, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo team; Zenodo team (2022). Zenodo Open Metadata snapshot - Training dataset for records and communities classifier building [Dataset]. http://doi.org/10.5281/zenodo.7438358
    Explore at:
    bin, application/gzipAvailable download formats
    Dataset updated
    Dec 15, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Zenodo team; Zenodo team
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains Zenodo's published open access records and communities metadata, including entries marked by the Zenodo staff as spam and deleted.

    The datasets are gzipped compressed JSON-lines files, where each line is a JSON object representation of a Zenodo record or community.

    Records dataset

    Filename: zenodo_open_metadata_{ date of export }.jsonl.gz

    Each object contains the terms: part_of, thesis, description, doi, meeting, imprint, references, recid, alternate_identifiers, resource_type, journal, related_identifiers, title, subjects, notes, creators, communities, access_right, keywords, contributors, publication_date

    which correspond to the fields with the same name available in Zenodo's record JSON Schema at https://zenodo.org/schemas/records/record-v1.0.0.json.

    In addition, some terms have been altered:

    • The term files contains a list of dictionaries containing filetype, size, and filename only.
    • The term license contains a short Zenodo ID of the license (e.g. "cc-by").

    Communities dataset

    Filename: zenodo_community_metadata_{ date of export }.jsonl.gz

    Each object contains the terms: id, title, description, curation_policy, page

    which correspond to the fields with the same name available in Zenodo's community creation form.

    Notes for all datasets

    For each object the term spam contains a boolean value, determining whether a given record/community was marked as spam content by Zenodo staff.

    Some values for the top-level terms, which were missing in the metadata may contain a null value.

    A smaller uncompressed random sample of 200 JSON lines is also included for each dataset to test and get familiar with the format without having to download the entire dataset.

  15. Coal Miners Detection

    • kaggle.com
    zip
    Updated Sep 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Unique Data (2023). Coal Miners Detection [Dataset]. https://www.kaggle.com/datasets/trainingdatapro/miners-detection
    Explore at:
    zip(5795006 bytes)Available download formats
    Dataset updated
    Sep 18, 2023
    Authors
    Unique Data
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Miners Object Detection dataset

    The dataset consists of of photos captured within various mines, focusing on miners engaged in their work. Each photo is annotated with bounding box detection of the miners, an attribute highlights whether each miner is sitting or standing in the photo.

    💴 For Commercial Usage: To discuss your requirements, learn about the price and buy the dataset, leave a request on our website to buy the dataset

    The dataset's diverse applications such as computer vision, safety assessment and others make it a valuable resource for researchers, employers, and policymakers in the mining industry.

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F12421376%2Fdb3f193275f5206914a19b127e20138e%2FFrame%2013.png?generation=1695040375509674&alt=media" alt="">

    Get the Dataset

    This is just an example of the data

    Leave a request on https://trainingdata.pro/datasets to discuss your requirements, learn about the price and buy the dataset

    Dataset structure

    • images - contains of original images of miners
    • boxes - includes bounding box labeling for the original images
    • annotations.xml - contains coordinates of the bounding boxes and labels, created for the original photo

    Data Format

    Each image from images folder is accompanied by an XML-annotation in the annotations.xml file indicating the coordinates of the bounding boxes for miners detection. For each point, the x and y coordinates are provided. The position of the miner is also provided by the attribute is_sitting (true, false).

    Example of XML file structure

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F12421376%2Febb59bc7d91a28f4e10c3f3da4ce4488%2Fcarbon%20(1).png?generation=1695040600108833&alt=media" alt="">

    Miners detection might be made in accordance with your requirements.

    🧩 This is just an example of the data. Leave a request here to learn more

    🚀 You can learn more about our high-quality unique datasets here

    keywords: coal mines, underground, safety monitoring system, safety dataset, manufacturing dataset, industrial safety database, health and safety dataset, quality control dataset, quality assurance dataset, annotations dataset, computer vision dataset, image dataset, object detection, human images, classification

  16. US Deep Learning Market Analysis, Size, and Forecast 2025-2029

    • technavio.com
    pdf
    Updated Jul 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio (2025). US Deep Learning Market Analysis, Size, and Forecast 2025-2029 [Dataset]. https://www.technavio.com/report/us-deep-learning-market-industry-analysis
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jul 8, 2025
    Dataset provided by
    TechNavio
    Authors
    Technavio
    License

    https://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice

    Time period covered
    2025 - 2029
    Description

    Snapshot img

    US Deep Learning Market Size 2025-2029

    The deep learning market size in US is forecast to increase by USD 5.02 billion at a CAGR of 30.1% between 2024 and 2029.

    The deep learning market is experiencing robust growth, driven by the increasing adoption of artificial intelligence (AI) in various industries for advanced solutioning. This trend is fueled by the availability of vast amounts of data, which is a key requirement for deep learning algorithms to function effectively. Industry-specific solutions are gaining traction, as businesses seek to leverage deep learning for specific use cases such as image and speech recognition, fraud detection, and predictive maintenance. Alongside, intuitive data visualization tools are simplifying complex neural network outputs, helping stakeholders understand and validate insights. 
    
    
    However, challenges remain, including the need for powerful computing resources, data privacy concerns, and the high cost of implementing and maintaining deep learning systems. Despite these hurdles, the market's potential for innovation and disruption is immense, making it an exciting space for businesses to explore further. Semi-supervised learning, data labeling, and data cleaning facilitate efficient training of deep learning models. Cloud analytics is another significant trend, as companies seek to leverage cloud computing for cost savings and scalability. 
    

    What will be the Size of the market During the Forecast Period?

    Request Free Sample

    Deep learning, a subset of machine learning, continues to shape industries by enabling advanced applications such as image and speech recognition, text generation, and pattern recognition. Reinforcement learning, a type of deep learning, gains traction, with deep reinforcement learning leading the charge. Anomaly detection, a crucial application of unsupervised learning, safeguards systems against security vulnerabilities. Ethical implications and fairness considerations are increasingly important in deep learning, with emphasis on explainable AI and model interpretability. Graph neural networks and attention mechanisms enhance data preprocessing for sequential data modeling and object detection. Time series forecasting and dataset creation further expand deep learning's reach, while privacy preservation and bias mitigation ensure responsible use.

    In summary, deep learning's market dynamics reflect a constant pursuit of innovation, efficiency, and ethical considerations. The Deep Learning Market in the US is flourishing as organizations embrace intelligent systems powered by supervised learning and emerging self-supervised learning techniques. These methods refine predictive capabilities and reduce reliance on labeled data, boosting scalability. BFSI firms utilize AI image recognition for various applications, including personalizing customer communication, maintaining a competitive edge, and automating repetitive tasks to boost productivity. Sophisticated feature extraction algorithms now enable models to isolate patterns with high precision, particularly in applications such as image classification for healthcare, security, and retail.

    How is this market segmented and which is the largest segment?

    The market research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments.

    Application
    
      Image recognition
      Voice recognition
      Video surveillance and diagnostics
      Data mining
    
    
    Type
    
      Software
      Services
      Hardware
    
    
    End-user
    
      Security
      Automotive
      Healthcare
      Retail and commerce
      Others
    
    
    Geography
    
      North America
    
        US
    

    By Application Insights

    The Image recognition segment is estimated to witness significant growth during the forecast period. In the realm of artificial intelligence (AI) and machine learning, image recognition, a subset of computer vision, is gaining significant traction. This technology utilizes neural networks, deep learning models, and various machine learning algorithms to decipher visual data from images and videos. Image recognition is instrumental in numerous applications, including visual search, product recommendations, and inventory management. Consumers can take photographs of products to discover similar items, enhancing the online shopping experience. In the automotive sector, image recognition is indispensable for advanced driver assistance systems (ADAS) and autonomous vehicles, enabling the identification of pedestrians, other vehicles, road signs, and lane markings.

    Furthermore, image recognition plays a pivotal role in augmented reality (AR) and virtual reality (VR) applications, where it tracks physical objects and overlays digital content onto real-world scenarios. The model training process involves the backpropagation algorithm, which calculates the loss fu

  17. B

    BI Analysis Software Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Jul 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). BI Analysis Software Report [Dataset]. https://www.datainsightsmarket.com/reports/bi-analysis-software-1963150
    Explore at:
    doc, pdf, pptAvailable download formats
    Dataset updated
    Jul 15, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The Business Intelligence (BI) analysis software market is booming, driven by big data and cloud computing. Discover key trends, growth projections (2025-2033), leading companies (Microsoft, Tableau, SAP, etc.), and regional market shares in our comprehensive analysis. Learn how BI is transforming decision-making across industries.

  18. z

    Simulated Object-Centric Event Logs (OCEL 2.0) for Order-to-Cash,...

    • zenodo.org
    xml
    Updated Oct 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alessandro Berti; Alessandro Berti (2024). Simulated Object-Centric Event Logs (OCEL 2.0) for Order-to-Cash, Procure-to-Pay, Hiring, and Hospital Patient Lifecycle Processes [Dataset]. http://doi.org/10.5281/zenodo.13879980
    Explore at:
    xmlAvailable download formats
    Dataset updated
    Oct 2, 2024
    Dataset provided by
    Zenodo
    Authors
    Alessandro Berti; Alessandro Berti
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains simulated object-centric event logs for four distinct business processes: Order-to-Cash (O2C), Procure-to-Pay (P2P), Hiring, and Hospital Patient Lifecycle. Each process is designed to reflect realistic workflows, encompassing multiple object types and capturing key activities, decision points, and process dynamics. The dataset is aimed at providing a rich source of data for process mining, analysis, and modeling activities.

    1. Order-to-Cash (O2C):
    The O2C process simulates an end-to-end business flow starting from customer order placement to payment receipt. It includes diverse activities such as order approval, fulfillment, invoice generation, and payment processing, involving object types like Customers, Orders, Products, and Invoices. The dataset captures variability through random decisions, synchronization between departments, and workarounds in credit checks and inventory adjustments. Attributes such as customer tiers, order values, and shipment statuses add further depth, allowing for detailed analysis of this complex process.

    2. Procure-to-Pay (P2P):
    The P2P process simulates the procurement lifecycle, from requisition creation to payment of suppliers. Key activities include purchase order creation, three-way matching, goods receipt, and payment processing. The event log records object types such as Purchase Requisitions, Purchase Orders, Suppliers, and Invoices. Variability is introduced through approval decisions, batching, and potential mismatches in the matching process. The dataset represents the inherent complexities of real-world procurement operations, including batching and synchronization issues between different process stages.

    3. Hiring Process:
    The hiring process log tracks the recruitment lifecycle, from job requisition creation to onboarding. It includes object types like Candidates, Job Requisitions, Recruiters, and Interviewers. The process covers activities such as resume screening, interviews, assessments, and offer management. Variability in the hiring process is introduced through random delays, candidate decisions, and background check durations. Batching occurs in stages like resume screening and onboarding, while synchronization challenges arise during interview scheduling.

    4. Hospital Patient Lifecycle:
    This log represents the lifecycle of patients within a hospital, capturing interactions with multiple resources such as physicians, beds, and medical equipment. The process begins with pre-admission activities, followed by diagnosis, treatment, and discharge. The dataset includes object types like Patients, Physicians, and Medical Equipment, with attributes related to patient demographics and event severity. The process reflects the dynamic nature of hospital operations, including synchronization of resources and the occurrence of workarounds in case of delays or resource unavailability.

    Each process simulation captures high variability, synchronization issues, and batching, making this dataset suitable for analyzing real-world operational challenges. The logs provide a comprehensive view of complex workflows, supporting advanced analysis, including object-centric process mining.

    This description will provide the necessary details about the dataset, highlighting its structure, purpose, and potential uses for researchers and process analysts.

    Object-centric event logs conceived and simulated by the o1-preview-2024-09-12 LRM, using the https://github.com/fit-alessandro-berti/llm-ocel-simulator project.

  19. n

    PLOS ONE publication and citation data

    • data-staging.niaid.nih.gov
    • data.niaid.nih.gov
    • +2more
    zip
    Updated May 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander Petersen (2023). PLOS ONE publication and citation data [Dataset]. http://doi.org/10.6071/M39W8V
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 15, 2023
    Dataset provided by
    University of California, Merced
    Authors
    Alexander Petersen
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Merged PLOS ONE and Web of Science data compiled in .dta files produced by STATA13. Included is a Do-file for reproducing the regression model estimates reported in the pre-print (Tables I and II) and published version (Table 1). Each observation (.dta line) corresponds to a given PLOS ONE article, with various article-level and editor-level characteristics used as explanatory and control variables. This summary provides a brief description of each variable and its source.

    If you use this data, please cite: A. M. Petersen. Megajournal mismanagement: Manuscript decision bias and anomalous editor activity at PLOS ONE. Journal of Informetrics 13, 100974 (2019). DOI: 10.1016/j.joi.2019.100974

    Methods We gathered the citation information for all PLOS ONE articles, indexed by A, from the Web of Science (WOS) Core Collection. From this data we obtained a master list of the unique digital object identifier, DOIA and the number of citations, cA, at the time of the data download (census) date

    (a) For the pre-print this corresponds to December 3, 2016;

    (b) and for the final published article this corresponds to February 25, 2019.

    We then used each DOIA to access the corresponding online XML version of each article at PLOS ONE by visiting the unique web address “http://journals.plos.org/plosone/article?id=” + “DOIA”. After parsing the full-text XML (primarily the author byline data and reference list), we merged the PLOS ONE publication information and WOS citation data by matching on DOIA.

    allofplos: PLOS has since made all full-text XML data freely available: https://www.plos.org/text-and-data-mining ; this option was not available at the moment of our data collection.

  20. Dataset B

    • figshare.com
    xlsx
    Updated Jul 18, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Suhailan Safei (2017). Dataset B [Dataset]. http://doi.org/10.6084/m9.figshare.5216377.v2
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jul 18, 2017
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Suhailan Safei
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    UEFA Championship ranking-based clustering output

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Xuhui Zhang; Wenjuan Yang; Bing Ma; Yanqun Wang; Yujia Wu; Jianxin Yan; Yongwei Liu; Chao Zhang; Jicheng Wan; Yue Wang; Mengyao Huang; Yuyang Li; Dian Zhao (2024). An open dataset for intelligent recognition and classification of abnormal condition in longwall mining [Dataset]. http://doi.org/10.6084/m9.figshare.22654945.v1
Organization logo

Data from: An open dataset for intelligent recognition and classification of abnormal condition in longwall mining

Related Article
Explore at:
binAvailable download formats
Dataset updated
Jul 8, 2024
Dataset provided by
Figsharehttp://figshare.com/
Authors
Xuhui Zhang; Wenjuan Yang; Bing Ma; Yanqun Wang; Yujia Wu; Jianxin Yan; Yongwei Liu; Chao Zhang; Jicheng Wan; Yue Wang; Mengyao Huang; Yuyang Li; Dian Zhao
License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

This work developed image dataset of underground longwall mining face (DsLMF+), which consists of 138004 images with annotation 6 categories of mine personnel, hydraulic support guard plate, large coal, towline, miners’ behaviour and mine safety helmet. All the labels of dataset are publicly available in YOLO format and COCO format.The dataset aims to support further research and advancement of the intelligent identification and classification of abnormal conditions for underground mining.

Search
Clear search
Close search
Google apps
Main menu