97 datasets found
  1. L

    Label Classifier Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated May 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Label Classifier Report [Dataset]. https://www.datainsightsmarket.com/reports/label-classifier-504593
    Explore at:
    doc, ppt, pdfAvailable download formats
    Dataset updated
    May 31, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The Label Classifier market is experiencing robust growth, driven by the increasing adoption of machine learning and artificial intelligence across diverse sectors. The market's expansion is fueled by the need for efficient and accurate data annotation and classification in applications ranging from image recognition and natural language processing to medical diagnosis and fraud detection. The rising volume of unstructured data and the need for automated data analysis are key catalysts for this growth. While precise market sizing data wasn't provided, considering the involvement of major tech players like Google, Microsoft, and Amazon, along with specialized AI companies, a reasonable estimate for the 2025 market size could be in the range of $500 million to $1 billion, depending on the specific definition of "Label Classifier" and the inclusion of related technologies. A Compound Annual Growth Rate (CAGR) of 25-30% over the forecast period (2025-2033) seems realistic given the current technological advancements and market demand. This growth is anticipated to continue, fueled by several factors. Advancements in deep learning algorithms, improved computational power, and the availability of larger datasets are enhancing the accuracy and efficiency of label classifiers. Furthermore, the increasing demand for automation in various industries, coupled with the growing need for real-time insights from data, will propel the market forward. However, challenges such as data security concerns, the need for skilled professionals to develop and maintain these systems, and the high computational costs associated with complex label classifiers could potentially act as restraints. The market is segmented based on deployment (cloud, on-premise), application (image recognition, text analysis, etc.), and industry (healthcare, finance, etc.). Key players are actively investing in research and development, expanding their product portfolios, and forging strategic partnerships to maintain a competitive edge in this rapidly evolving market. The competitive landscape is dynamic, with both established tech giants and specialized AI startups vying for market share.

  2. FDA Online Label Repository

    • catalog.data.gov
    • healthdata.gov
    • +5more
    Updated Jul 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Food and Drug Administration (2025). FDA Online Label Repository [Dataset]. https://catalog.data.gov/dataset/fda-online-label-repository
    Explore at:
    Dataset updated
    Jul 11, 2025
    Dataset provided by
    Food and Drug Administrationhttp://www.fda.gov/
    Description

    The drug labels and other drug-specific information on this Web site represent the most recent drug listing information companies have submitted to the Food and Drug Administration (FDA). (See 21 CFR part 207.) The drug labeling and other information has been reformatted to make it easier to read but its content has neither been altered nor verified by FDA. The drug labeling on this Web site may not be the labeling on currently distributed products or identical to the labeling that is approved. Most OTC drugs are not reviewed and approved by FDA, however they may be marketed if they comply with applicable regulations and policies described in monographs. Drugs marked 'OTC monograph final' or OTC monograph not final' are not checked for conformance to the monograph. Drugs marked 'unapproved medical gas', 'unapproved homeopathic' or 'unapproved drug other' on this Web site have not been evaluated by FDA for safety and efficacy and their labeling has not been approved. In addition, FDA is not aware of scientific evidence to support homeopathy as effective.

  3. o

    FSDnoisy18k

    • explore.openaire.eu
    • opendatalab.com
    • +3more
    Updated Jan 3, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eduardo Fonseca; Mercedes Collado; Manoj Plakal; Daniel P. W. Daniel P. W. Ellis; Frederic Font; Xavier Favory; Xavier Serra (2019). FSDnoisy18k [Dataset]. http://doi.org/10.5281/zenodo.2529933
    Explore at:
    Dataset updated
    Jan 3, 2019
    Authors
    Eduardo Fonseca; Mercedes Collado; Manoj Plakal; Daniel P. W. Daniel P. W. Ellis; Frederic Font; Xavier Favory; Xavier Serra
    Description

    FSDnoisy18k is an audio dataset collected with the aim of fostering the investigation of label noise in sound event classification. It contains 42.5 hours of audio across 20 sound classes, including a small amount of manually-labeled data and a larger quantity of real-world noisy data. Data curators Eduardo Fonseca and Mercedes Collado Contact You are welcome to contact Eduardo Fonseca should you have any questions at eduardo.fonseca@upf.edu. Citation If you use this dataset or part of it, please cite the following ICASSP 2019 paper: Eduardo Fonseca, Manoj Plakal, Daniel P. W. Ellis, Frederic Font, Xavier Favory, and Xavier Serra, “Learning Sound Event Classifiers from Web Audio with Noisy Labels”, arXiv preprint arXiv:1901.01189, 2019 You can also consider citing our ISMIR 2017 paper that describes the Freesound Annotator, which was used to gather the manual annotations included in FSDnoisy18k: Eduardo Fonseca, Jordi Pons, Xavier Favory, Frederic Font, Dmitry Bogdanov, Andres Ferraro, Sergio Oramas, Alastair Porter, and Xavier Serra, “Freesound Datasets: A Platform for the Creation of Open Audio Datasets”, In Proceedings of the 18th International Society for Music Information Retrieval Conference, Suzhou, China, 2017 FSDnoisy18k description What follows is a summary of the most basic aspects of FSDnoisy18k. For a complete description of FSDnoisy18k, make sure to check: the FSDnoisy18k companion site: http://www.eduardofonseca.net/FSDnoisy18k/ the description provided in Section 2 of our ICASSP 2019 paper FSDnoisy18k is an audio dataset collected with the aim of fostering the investigation of label noise in sound event classification. It contains 42.5 hours of audio across 20 sound classes, including a small amount of manually-labeled data and a larger quantity of real-world noisy data. The source of audio content is Freesound—a sound sharing site created an maintained by the Music Technology Group hosting over 400,000 clips uploaded by its community of users, who additionally provide some basic metadata (e.g., tags, and title). The 20 classes of FSDnoisy18k are drawn from the AudioSet Ontology and are selected based on data availability as well as on their suitability to allow the study of label noise. The 20 classes are: "Acoustic guitar", "Bass guitar", "Clapping", "Coin (dropping)", "Crash cymbal", "Dishes, pots, and pans", "Engine", "Fart", "Fire", "Fireworks", "Glass", "Hi-hat", "Piano", "Rain", "Slam", "Squeak", "Tearing", "Walk, footsteps", "Wind", and "Writing". FSDnoisy18k was created with the Freesound Annotator, which is a platform for the collaborative creation of open audio datasets. We defined a clean portion of the dataset consisting of correct and complete labels. The remaining portion is referred to as the noisy portion. Each clip in the dataset has a single ground truth label (singly-labeled data). The clean portion of the data consists of audio clips whose labels are rated as present in the clip and predominant (almost all with full inter-annotator agreement), meaning that the label is correct and, in most cases, there is no additional acoustic material other than the labeled class. A few clips may contain some additional sound events, but they occur in the background and do not belong to any of the 20 target classes. This is more common for some classes that rarely occur alone, e.g., “Fire”, “Glass”, “Wind” or “Walk, footsteps”. The noisy portion of the data consists of audio clips that received no human validation. In this case, they are categorized on the basis of the user-provided tags in Freesound. Hence, the noisy portion features a certain amount of label noise. Code We've released the code for our ICASSP 2019 paper at https://github.com/edufonseca/icassp19. The framework comprises all the basic stages: feature extraction, training, inference and evaluation. After loading the FSDnoisy18k dataset, log-mel energies are computed and a CNN baseline is trained and evaluated. The code also allows to test four noise-robust loss functions. Please check our paper for more details. Label noise characteristics FSDnoisy18k features real label noise that is representative of audio data retrieved from the web, particularly from Freesound. The analysis of a per-class, random, 15% of the noisy portion of FSDnoisy18k revealed that roughly 40% of the analyzed labels are correct and complete, whereas 60% of the labels show some type of label noise. Please check the FSDnoisy18k companion site for a detailed characterization of the label noise in the dataset, including a taxonomy of label noise for singly-labeled data as well as a per-class description of the label noise. FSDnoisy18k basic characteristics The dataset most relevant characteristics are as follows: FSDnoisy18k contains 18,532 audio clips (42.5h) unequally distributed in the 20 aforementioned classes drawn from the AudioSet Ontology. The audio clips are provided as uncompressed PCM 16 bit, 44.1 kHz, mono audio...

  4. In Mold Labelling Market Analysis, Size, and Forecast 2025-2029: Europe...

    • technavio.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio, In Mold Labelling Market Analysis, Size, and Forecast 2025-2029: Europe (France, Germany, Italy, Spain, UK), North America (US and Canada), APAC (China, Japan), Middle East and Africa , and South America (Brazil) [Dataset]. https://www.technavio.com/report/in-mold-labelling-market-industry-analysis
    Explore at:
    Dataset provided by
    TechNavio
    Authors
    Technavio
    Time period covered
    2021 - 2025
    Area covered
    Canada, United States, Germany, Global
    Description

    Snapshot img

    In Mold Labelling Market Size 2025-2029

    The in mold labelling market size is forecast to increase by USD 1.05 billion at a CAGR of 5.1% between 2024 and 2029.

    The In Mold Labelling (IML) market is experiencing significant growth due to the increasing global manufacturing output, particularly in sectors such as automotive, packaging, and electronics. IML technology offers several advantages, including improved product aesthetics, reduced material usage, and enhanced branding capabilities. However, the high initial investments required for IML equipment and tooling can act as a barrier to entry for some companies. Key market trends include the increasing adoption of digital technologies, such as 3D design and simulation software, to optimize the IML design process. Additionally, the growing demand for sustainable labeling solutions is driving innovation in the market, with biodegradable and recyclable IML materials gaining popularity. The in mold labelling (IML) market is experiencing significant growth due to the increasing production output in various industries, particularly In the spheres of spa, frozen food, packaging, personal care, and cosmetics.
    Companies seeking to capitalize on these opportunities must stay abreast of technological advancements and market trends while navigating the challenges of high upfront costs and regulatory compliance. By investing in research and development and forming strategic partnerships, companies can differentiate themselves in the competitive IML market and secure a strong market position.
    

    What will be the Size of the In Mold Labelling Market during the forecast period?

    Request Free Sample

    The in mold labeling market in the United States is experiencing significant growth, driven by the increasing demand for labeling solutions that offer superior durability, resistance, and sustainability. Key market dynamics include labeling data management for efficient production and supply chain tracking, labeling recycling and waste reduction, and labeling traceability for enhanced product safety and regulatory compliance. Chemical-resistant labels, label resistance, and labeling automation software are critical trends, enabling manufacturers to streamline processes and reduce production costs. Labeling system integration, labeling industry leaders, and high-definition printing are also driving innovation, with advancements in label durability testing, holographic labels, glossy labels, UV curing, heat-resistant labels, label peelability, scratch-resistant labels, waterproof labels, and labeling market forecasts. IML utilizes polypropylene as the label material, enabling multi-colored prints and intricate designs.
    Functional labels, such as tactile, embossed, and matte labels, are gaining popularity due to their aesthetic appeal and added functionality. Decorative labels, metallic labels, and embossed labels are also increasingly being used for brand differentiation and consumer appeal. Labeling market analysis indicates continued growth, with a focus on labeling sustainability assessment, label removal, and labeling upcycling. The market is expected to remain competitive, with ongoing innovation trends in labeling technology and certification standards. Overall, the in mold labeling market is a dynamic and evolving industry, responding to the changing needs of consumers and businesses alike. Eco-friendly options and automation are also driving the growth of the IML market, ensuring its continued relevance as a branding tool in today's competitive business landscape.
    

    How is this In Mold Labelling Industry segmented?

    The in mold labelling industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments.

    Technology
    
      Injection molding
      Blow molding
      Thermoforming
    
    
    End-user
    
      Food and beverage
      Cosmetics
      Pharmaceuticals
      Others
    
    
    Material
    
      Polypropylene
      Polyethylene
      Polyvinyl chloride
      Acrylonitrile butadiene styrene
      Others
    
    
    Geography
    
      Europe
    
        France
        Germany
        Italy
        Spain
        UK
    
    
      North America
    
        US
        Canada
    
    
      APAC
    
        China
        Japan
    
    
      Middle East and Africa
    
    
    
      South America
    
        Brazil
    

    By Technology Insights

    The injection molding segment is estimated to witness significant growth during the forecast period. The in mold labeling market experiences significant growth due to various factors. One of these factors is the increasing demand for labeling in various industries, including healthcare, packaging, automobile, consumer goods, and electronics. Injection molding machines, a crucial component in the in mold labeling process, are in high demand due to their versatility and efficiency. These machines, consisting of an injection unit and a clamping unit, enable th

  5. P

    Data from: ImageNet Dataset

    • paperswithcode.com
    Updated Feb 2, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jia Deng; Wei Dong; Richard Socher; Li-Jia Li; Kai Li; Fei-Fei Li (2021). ImageNet Dataset [Dataset]. https://paperswithcode.com/dataset/imagenet
    Explore at:
    Dataset updated
    Feb 2, 2021
    Authors
    Jia Deng; Wei Dong; Richard Socher; Li-Jia Li; Kai Li; Fei-Fei Li
    Description

    The ImageNet dataset contains 14,197,122 annotated images according to the WordNet hierarchy. Since 2010 the dataset is used in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), a benchmark in image classification and object detection. The publicly released dataset contains a set of manually annotated training images. A set of test images is also released, with the manual annotations withheld. ILSVRC annotations fall into one of two categories: (1) image-level annotation of a binary label for the presence or absence of an object class in the image, e.g., “there are cars in this image” but “there are no tigers,” and (2) object-level annotation of a tight bounding box and class label around an object instance in the image, e.g., “there is a screwdriver centered at position (20,25) with width of 50 pixels and height of 30 pixels”. The ImageNet project does not own the copyright of the images, therefore only thumbnails and URLs of images are provided.

    Total number of non-empty WordNet synsets: 21841 Total number of images: 14197122 Number of images with bounding box annotations: 1,034,908 Number of synsets with SIFT features: 1000 Number of images with SIFT features: 1.2 million

  6. d

    Data from: Processed Lab Data for Neural Network-Based Shear Stress Level...

    • catalog.data.gov
    • data.openei.org
    • +3more
    Updated Jan 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pennsylvania State University (2025). Processed Lab Data for Neural Network-Based Shear Stress Level Prediction [Dataset]. https://catalog.data.gov/dataset/processed-lab-data-for-neural-network-based-shear-stress-level-prediction-309d2
    Explore at:
    Dataset updated
    Jan 20, 2025
    Dataset provided by
    Pennsylvania State University
    Description

    Machine learning can be used to predict fault properties such as shear stress, friction, and time to failure using continuous records of fault zone acoustic emissions. The files are extracted features and labels from lab data (experiment p4679). The features are extracted with a non-overlapping window from the original acoustic data. The first column is the time of the window. The second and third columns are the mean and the variance of the acoustic data in this window, respectively. The 4th-11th column is the the power spectrum density ranging from low to high frequency. And the last column is the corresponding label (shear stress level). The name of the file means which driving velocity the sequence is generated from. Data were generated from laboratory friction experiments conducted with a biaxial shear apparatus. Experiments were conducted in the double direct shear configuration in which two fault zones are sheared between three rigid forcing blocks. Our samples consisted of two 5-mm-thick layers of simulated fault gouge with a nominal contact area of 10 by 10 cm^2. Gouge material consisted of soda-lime glass beads with initial particle size between 105 and 149 micrometers. Prior to shearing, we impose a constant fault normal stress of 2 MPa using a servo-controlled load-feedback mechanism and allow the sample to compact. Once the sample has reached a constant layer thickness, the central block is driven down at constant rate of 10 micrometers per second. In tandem, we collect an AE signal continuously at 4 MHz from a piezoceramic sensor embedded in a steel forcing block about 22 mm from the gouge layer The data from this experiment can be used with the deep learning algorithm to train it for future fault property prediction.

  7. d

    Replication Data for: Detecting Voter Understanding of Ideological Labels...

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Miwa, Hirofumi; Arami, Reiko; Taniguchi, Masaki (2023). Replication Data for: Detecting Voter Understanding of Ideological Labels Using a Conjoint Experiment [Dataset]. http://doi.org/10.7910/DVN/FIHGN0
    Explore at:
    Dataset updated
    Nov 14, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Miwa, Hirofumi; Arami, Reiko; Taniguchi, Masaki
    Description

    Understanding voters’ conception of ideological labels is critical for political behavioral research. Conventional research designs have several limitations, such as endogeneity, insufficient responses to open-ended questions, and inseparability of composite treatment effects. To address these challenges, we propose a conjoint experiment to study the meanings ascribed to ideological labels in terms of policy positions. We also suggest using a mixture model approach to explore heterogeneity in voters’ understandings of ideological labels, as well as the average interpretation of labels. We applied these approaches to conceptions of left–right labels in Japan, where the primary issue of elite-level conflicts has been distinctive compared with other developed countries. We found that, on average, while Japanese voters understand policy-related meanings of “left” and “right,” they primarily associate these labels with security and nationalism, and, secondarily, with social issues; they do not associate these labels with economic issues. Voters’ understandings partly depend on their birth cohort, but observed patterns do not necessarily coincide with what many researchers would predict regarding generational differences in Japanese politics. Mixture model results suggest that some individuals tend to associate left–right labels with security and nationalism policies, while others link them to social policies. Over one-third of respondents seemed to barely understand the usage of left–right labels in policy positions. Our study improves upon existing methods for measuring voter understanding of ideological labels, and reconfirm the global diversity of meanings associated with left–right labels.

  8. R

    Labeled Temporal Brain Networks

    • entrepot.recherche.data.gouv.fr
    txt, zip
    Updated Jul 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aurora ROSSI; Aurora ROSSI (2023). Labeled Temporal Brain Networks [Dataset]. http://doi.org/10.57745/HHNT10
    Explore at:
    txt(1498), zip(648811279)Available download formats
    Dataset updated
    Jul 21, 2023
    Dataset provided by
    Recherche Data Gouv
    Authors
    Aurora ROSSI; Aurora ROSSI
    License

    https://entrepot.recherche.data.gouv.fr/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.57745/HHNT10https://entrepot.recherche.data.gouv.fr/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.57745/HHNT10

    Dataset funded by
    French government, National Research Agency (ANR)
    Description

    Labeled Temporal Brain Networks This dataset contains a collection of temporal brain networks of 100 subjects. Each subject has a label representing their biological sex ("M" for male and "F" for female) and age range (22-25, 26-30,31-35 and 36+). The networks are obtained from resting-state fMRI data from the Human Connectome Project (HCP) and are undirected and weighted. The number of nodes is fixed at 202, instead the edge weights change their values over time. Dataset structure The networks.zip file contains the networks as .txt files in the following format: the first line of each .txt file contains the number of nodes and the number of snapshots of the network divided by a space. The following lines contain the list of edges of the network in the form i,j,t,w, meaning that the edge between node i and node j at time t has weight w. The labels are contained in the file labels.txt, where there are three columns separated by a space, where the first column is the identifier of a subject, the second is the biological sex, and the last is an age range. Acknowledgments Data were provided by the Human Connectome Project, WU-Minn Consortium (Principal Investigators: David Van Essen and Kamil Ugurbil; 1U54MH091657) funded by the 16 NIH Institutes and Centers that support the NIH Blueprint for Neuroscience Research; and by the McDonnell Center for Systems Neuroscience at Washington University. The authors are grateful to the OPAL infrastructure from Université Côte d'Azur for providing resources and support. This work has been supported by the French government, through the UCA DS4H Investments in the Future project managed by the National Research Agency (ANR) with the reference number ANR-17-EURE-0004.

  9. Lig-PCDB: Labeled Databases of X-ray Ligands Images in 3D Point Clouds and...

    • zenodo.org
    Updated Apr 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cristina F Bazzano; Daniela B. B. Trivella; Guilherme P. Telles; Luiz G. Alves; Cristina F Bazzano; Daniela B. B. Trivella; Guilherme P. Telles; Luiz G. Alves (2025). Lig-PCDB: Labeled Databases of X-ray Ligands Images in 3D Point Clouds and Validated Deep Learning Models [Dataset]. http://doi.org/10.5281/zenodo.7872578
    Explore at:
    Dataset updated
    Apr 24, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Cristina F Bazzano; Daniela B. B. Trivella; Guilherme P. Telles; Luiz G. Alves; Cristina F Bazzano; Daniela B. B. Trivella; Guilherme P. Telles; Luiz G. Alves
    Description

    LigPCDS: Labeled Dataset of X-ray Protein Ligand 3D Images in Point Clouds and Validated Deep Learning Models

    The difference electron density from X-ray protein crystallography was used to create the first dataset of labeled images of ligands in 3D point clouds, named LigPCDS.

    Four proposed vocabularies were validated by successfully training good performance deep learning models for the semantic segmentation of a stratified dataset from Lig-PCDB. The data from organic molecules (ligands) was obtained from the world Protein Data Bank with resolutions ranging from 1.5 to 2.2 Å. The ligands' images were interpolated from their calculated difference electron density map in a 3D grid-like bounding box, around their atomic positions, and stored in point clouds. A grid spacing of 0.5 Å gave the best results. The density value of the grid points was used as feature. The labeling approach used the structure of the ligands to propose vocabularies of chemical classes based on the chemical atoms themselves and their cyclic substructures. These annotations were applied pointwise to the ligands' images using an atomic sphere model. The databases and validated models may be used to tackle problems regarding known and unknown ligand building to drug discovery and fragment screening pipelines.

    The four validated deep learning models are: (i) the LigandRegion, composed by generic atoms of any type; (ii) the AtomCycle, composed by generic atoms outside cycles and generic cycles; (iii) the AtomC347CA56, composed by generic atoms outside cycles, not aromatic cycles of size 3 to 7 and aromatic cycles of size 5 and 6; and (iv) the AtomSymbolGroups, composed by the atoms symbols with groupings. The mean accuracy of these models in their cross-validation was between 49.7% and 77.4% in terms of Intersection over Union (mIoU) metric and between 62.4% and 87.0% in F1-score (mF1).

    The code used to create and validated the Lig-PCDB is available at the following repository: https://github.com/danielatrivella/np3_ligand

    This repository also contains the NP³ Blob Label application for ligand building using the validated deep learning models from Lig-PCDB.

    License

    LigPCDS by Cristina Freitas Bazzano, Luiz G. Alves,Guilherme P. Telles, Daniela B. B. Trivella is marked with CC0 1.0 Universal .

  10. n

    ramp Building Footprint Dataset - N'Djamena, Chad

    • access.earthdata.nasa.gov
    • cmr.earthdata.nasa.gov
    Updated Oct 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). ramp Building Footprint Dataset - N'Djamena, Chad [Dataset]. http://doi.org/10.34911/rdnt.b0noju
    Explore at:
    Dataset updated
    Oct 10, 2023
    Time period covered
    Jan 1, 2020 - Jan 1, 2023
    Area covered
    Description

    This chipped training dataset is over N'Djamena and includes high-resolution imagery (.tif format) and corresponding building footprint vector labels (.geojson format) in 256 x 256 pixel tile/label pairs. This dataset is a ramp Tier 2 dataset, meaning it has NOT been thoroughly reviewed and improved. This dataset was produced for the ramp project and contains 3,044 tiles and 124,208 individual buildings. The satellite imagery resolution is 45 cm and was sourced from Maxar ODP (10300100AA405C00). Dataset keywords: Urban, Peri-urban, Rural

  11. Z

    Antisemitism on Twitter: A Dataset for Machine Learning and Text Analytics

    • data.niaid.nih.gov
    • zenodo.org
    Updated Dec 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Soemer, Katharina (2024). Antisemitism on Twitter: A Dataset for Machine Learning and Text Analytics [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7872834
    Explore at:
    Dataset updated
    Dec 13, 2024
    Dataset provided by
    Soemer, Katharina
    Miehling, Daniel
    Karali, Sameer
    Jikeli, Gunther
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset from the Institute for the Study of Contemporary Antisemitism (ISCA) at Indiana University:

    The Social Media & Hate research lab at the Institute for the Study of Contemporary Antisemitism compiled this dataset using an annotation portal (Jikeli, Soemer, and Karali 2024), which was used to label tweets as either antisemitic or non-antisemitic, among other labels. Note that annotation was done on live data, including images and context, such as threads. All data was annotated by two experts, and all discrepancies were discussed (Jikeli et al. 2023).

    Content:

    This dataset contains 11311 tweets covering a wide range of topics common in conversations about Jews, Israel, and antisemitism between January 2019 and April 2023. The dataset consists of random samples of relevant keywords during this time period. 1,953 tweets (17%) are antisemitic according to the IHRA definition of antisemitism.

    The distribution of tweets by year is as follows: 1499 (13%) from 2019, 3712 (33%) from 2020, 2591 (23%) from 2021, 2644 from 2022 (23%) and 865 (8%) from 2023. 6365 (56%) contain the keyword "Jews," 4134 (37%) include "Israel," 529 (5%) feature the derogatory term "ZioNazi*," and 283 (3%) use the slur "K---s." Some tweets may contain multiple keywords.

    725 out of the 6365 tweets with the keyword "Jews" (11%) and 664 out of the 4134 tweets with the keyword "Israel" (16%) were classified as antisemitic. 97 out of the 283 tweets using the antisemitic slur "K---s" (34%) are antisemitic. Interestingly, many tweets featuring the slur "K---s" actually call out its use. In contrast, the majority of tweets using the derogatory term "ZioNazi*" are antisemitic, with 467 out of 529 (88%) being classified as such.

    File Description:

    The dataset is provided in a csv file format, with each row representing a single message, including replies, quotes, and retweets. The file contains the following columns:

    ‘ID’: Represents the tweet ID.

    ‘Username’: Represents the username that posted the tweet.

    ‘Text’: Represents the full text of the tweet (not pre-processed).

    ‘CreateDate’: Represents the date on which the tweet was created.

    ‘Biased’: Represents the label given by our annotations as to whether the tweet is antisemitic or not.

    ‘Keyword’: Represents the keyword that was used in the query. The keyword can be in the text, including hashtags, mentioned users, or the username itself.

    Licences

    Data is published under the terms of the "Creative Commons Attribution 4.0 International" licence (https://creativecommons.org/licenses/by/4.0)

    Acknowledgements

    We are grateful for the support of Indiana University’s Observatory on Social Media (OSoMe) (Davis et al. 2016) and the contributions and annotations of all team members in our Social Media & Hate Research Lab at Indiana University’s Institute for the Study of Contemporary Antisemitism, especially Grace Bland, Elisha S. Breton, Kathryn Cooper, Robin Forstenhäusler, Sophie von Máriássy, Mabel Poindexter, Jenna Solomon, Clara Schilling, and Victor Tschiskale.

    This work used Jetstream2 at Indiana University through allocation HUM200003 from the Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support (ACCESS) program, which is supported by National Science Foundation grants #2138259, #2138286, #2138307, #2137603, and #2138296.

  12. f

    Data from: Comment on the Definition and Labeling of pK50

    • acs.figshare.com
    txt
    Updated Aug 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mark A. Watson; Ryne C. Johnston; Art Bochevarov (2023). Comment on the Definition and Labeling of pK50 [Dataset]. http://doi.org/10.1021/acs.jcim.3c01210.s001
    Explore at:
    txtAvailable download formats
    Dataset updated
    Aug 21, 2023
    Dataset provided by
    ACS Publications
    Authors
    Mark A. Watson; Ryne C. Johnston; Art Bochevarov
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    We propose a more rigorous definition for the recently introduced concept of pK50. The value of pK50 should be associated not with a “functional group”, as originally postulated, but instead with an atom. The proposed clarification is meant to improve the interpretation and labeling of pK50.

  13. n

    ramp Building Footprint Dataset - Manjama, Sierra Leone

    • cmr.earthdata.nasa.gov
    • access.earthdata.nasa.gov
    Updated Oct 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). ramp Building Footprint Dataset - Manjama, Sierra Leone [Dataset]. http://doi.org/10.34911/rdnt.fp33ih
    Explore at:
    Dataset updated
    Oct 10, 2023
    Time period covered
    Jan 1, 2020 - Jan 1, 2023
    Area covered
    Description

    This chipped training dataset is over Manjama and includes high-resolution imagery (.tif format) and corresponding building footprint vector labels (.geojson format) in 256 x 256 pixel tile/label pairs. This dataset is a ramp Tier 1 dataset, meaning it has been thoroughly reviewed and improved. This dataset was used in developing the ramp baseline model and contains 4,671 tiles and 60,379 individual buildings. The satellite imagery resolution is 30 cm and was sourced from Maxar ODP (1040010056B6FA00). Dataset keywords: Urban, Peri-Urban.

  14. m

    Bangla Multilabel Cyberbully, Sexual Harrasment, Threat and Spam Detection...

    • data.mendeley.com
    Updated Jul 16, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saieef Sunny (2024). Bangla Multilabel Cyberbully, Sexual Harrasment, Threat and Spam Detection Dataset [Dataset]. http://doi.org/10.17632/sz5558wrd4.3
    Explore at:
    Dataset updated
    Jul 16, 2024
    Authors
    Saieef Sunny
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Overview The Bangla Multilabel Cyberbully, Sexual Harassment, Threat, and Spam Detection Dataset is designed to facilitate the development of machine learning models to detect and classify various types of abusive content in Bangla social media text. This dataset contains a collection of comments annotated for multiple types of abuse, making it suitable for multilabel classification tasks. It aims to support research and development in natural language processing (NLP) to enhance online safety and moderate harmful content on Bangla language social media platforms.

    Purpose 1. Train and evaluate machine learning models for detection of cyberbullying, sexual harassment, religious hate speech, threats, and spam in Bangla comments. 2. Support research in NLP and machine learning focused on Bangla, a low-resource language. 3. Aid in developing automated moderation systems for social media platforms to ensure safe and respectful communication.

    Data Collection Initially, we collected around 30,000 comments from social media platforms like Facebook and TikTok. These comments were in Bangla, English, and Banglish (Bangla written using English characters). Since our research focuses on Bangla abusive text detection, we refined the dataset through the following steps:

    1. We filtered out all comments written in English to focus on the Bangla text.
    2. To ensure data quality, We eliminated duplicate entries and rows with missing or null values.
    3. We removed any remaining English characters and both Bangla and English numerical values to ensure the analysis was based solely on Bangla text.

    After these steps, we obtained a final dataset of 12,557 comments. Each comment was manually labeled into five classes: bully, sexual, religious, threat, and spam. This dataset supports multi-class labeling, meaning a comment can simultaneously belong to more than one class.

    Dataset Columns 1. Gender: Indicates the gender of the person who received the bullying. 2. Profession: Indicates the profession of the person who received the bullying. 3. Comment: Contains the text of the comment in Bangla. 4. Bully: Binary label indicating whether the comment contains bullying content. (0 for no, 1 for yes) 5. Sexual: Binary label indicating whether the comment contains sexual harassment content. (0 for no, 1 for yes) 6. Religious: Binary label indicating whether the comment contains religious hate speech. (0 for no, 1 for yes) 7. Threat: Binary label indicating whether the comment contains threats. (0 for no, 1 for yes) 8. Spam: Binary label indicating whether the comment is considered spam. (0 for no, 1 for yes)

    Applications 1. Training and testing machine learning models for multilabel classification. 2. Research on natural language processing (NLP) and cyberbullying detection in low-resource languages like Bangla. 3. Developing automated systems for monitoring and moderating online content on social media platforms to ensure safe and respectful communication.

  15. Blank Discs and Labels Market Report | Global Forecast From 2025 To 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Jan 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Blank Discs and Labels Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-blank-discs-and-labels-market
    Explore at:
    csv, pdf, pptxAvailable download formats
    Dataset updated
    Jan 7, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Blank Discs and Labels Market Outlook



    The global blank discs and labels market size was valued at approximately USD 1.2 billion in 2023 and is forecasted to reach USD 1.8 billion by 2032, growing at a CAGR of 4.2% during the forecast period. The growth of this market is primarily driven by the increasing demand for physical data storage solutions, despite the rise of cloud storage. The versatility and ease of use associated with blank discs and labels continue to make them a preferred choice for many consumers and businesses.



    One of the primary growth factors for the blank discs and labels market is the consistent demand for physical media storage solutions. Many industries, particularly those in specialized sectors such as healthcare, legal, and media production, continue to rely heavily on physical storage media to archive sensitive information and large data files. Moreover, the music and entertainment industry, while embracing digital distribution, still exhibits a significant demand for physical media due to the popularity of physical music albums and movie collections among enthusiasts and collectors.



    Additionally, the educational sector is contributing to the market's growth. Educational institutions often require a reliable and cost-effective method for duplicating and distributing educational content, such as lectures, tutorials, and software programs. Blank discs and labels offer a tangible medium that students can easily access without needing an internet connection. This is particularly significant in regions where internet accessibility is limited or inconsistent, making physical media an essential tool in the educational process.



    Another driving factor is the rise of small businesses and home-based entrepreneurs who utilize blank discs and printable labels for various purposes, including branding, marketing, and data storage. The availability of affordable and user-friendly disc-burning and label-printing technology has empowered these smaller entities to produce professional-looking products without incurring substantial costs. This trend is expected to continue, as more individuals and small businesses seek cost-effective and customizable solutions for their media and labeling needs.



    Regionally, the blank discs and labels market sees a varied demand pattern. North America and Europe, being technologically advanced regions, have a substantial market share. However, the Asia Pacific region is emerging as a rapidly growing market due to the increasing adoption of digital media and the expansion of the educational sector. Moreover, the presence of numerous small and medium enterprises (SMEs) in the Asia Pacific region further fuels the demand for blank discs and labels for data storage and distribution needs.



    The introduction of CD-R and CD-RW formats has significantly impacted the blank discs market, offering users the flexibility to choose between write-once and rewritable options. CD-Rs are often preferred for permanent data storage, where the information needs to remain unchanged, such as in archiving important documents or creating music albums. On the other hand, CD-RWs provide the advantage of being reusable, allowing users to erase and rewrite data multiple times. This versatility makes them ideal for applications that require frequent updates or temporary storage, such as in educational settings or for software testing. The availability of these options has broadened the appeal of blank discs, catering to a wide range of consumer and business needs.



    Product Type Analysis



    The blank discs and labels market is segmented into CDs, DVDs, Blu-ray discs, printable labels, and adhesive labels. CDs and DVDs have been traditional staples in the market, used extensively for personal and professional data storage. Despite the proliferation of digital media, CDs and DVDs maintain their relevance due to their cost-effectiveness, durability, and ease of use. They are particularly favored in regions where digital alternatives might not be as accessible or affordable.



    Blu-ray discs represent a more advanced segment, offering significantly higher storage capacity compared to CDs and DVDs. This makes them ideal for high-definition video recording, large-scale data archiving, and software distribution, especially in industries requiring robust storage solutions. The increasing production of high-definition content and the need for reliable storage options are driving the demand for Blu-ray discs, althou

  16. Uplift Modeling , Marketing Campaign Data

    • kaggle.com
    zip
    Updated Nov 1, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Möbius (2020). Uplift Modeling , Marketing Campaign Data [Dataset]. https://www.kaggle.com/arashnic/uplift-modeling
    Explore at:
    zip(340156703 bytes)Available download formats
    Dataset updated
    Nov 1, 2020
    Authors
    Möbius
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    Uplift modeling is an important yet novel area of research in machine learning which aims to explain and to estimate the causal impact of a treatment at the individual level. In the digital advertising industry, the treatment is exposure to different ads and uplift modeling is used to direct marketing efforts towards users for whom it is the most efficient . The data is a collection collection of 13 million samples from a randomized control trial, scaling up previously available datasets by a healthy 590x factor.

    ###
    ###

    Content

    The dataset was created by The Criteo AI Lab .The dataset consists of 13M rows, each one representing a user with 12 features, a treatment indicator and 2 binary labels (visits and conversions). Positive labels mean the user visited/converted on the advertiser website during the test period (2 weeks). The global treatment ratio is 84.6%. It is usual that advertisers keep only a small control population as it costs them in potential revenue.

    Following is a detailed description of the features:

    • f0, f1, f2, f3, f4, f5, f6, f7, f8, f9, f10, f11: feature values (dense, float)
    • treatment: treatment group (1 = treated, 0 = control)
    • conversion: whether a conversion occured for this user (binary, label)
    • visit: whether a visit occured for this user (binary, label)
    • exposure: treatment effect, whether the user has been effectively exposed (binary)

    ###

    Context

    Uplift modeling is an important yet novel area of research in machine learning which aims to explain and to estimate the causal impact of a treatment at the individual level. In the digital advertising industry, the treatment is exposure to different ads and uplift modeling is used to direct marketing efforts towards users for whom it is the most efficient . The data is a collection collection of 13 million samples from a randomized control trial, scaling up previously available datasets by a healthy 590x factor.

    ###
    ###

    Content

    The dataset was created by The Criteo AI Lab .The dataset consists of 13M rows, each one representing a user with 12 features, a treatment indicator and 2 binary labels (visits and conversions). Positive labels mean the user visited/converted on the advertiser website during the test period (2 weeks). The global treatment ratio is 84.6%. It is usual that advertisers keep only a small control population as it costs them in potential revenue.

    Following is a detailed description of the features:

    • f0, f1, f2, f3, f4, f5, f6, f7, f8, f9, f10, f11: feature values (dense, float)
    • treatment: treatment group (1 = treated, 0 = control)
    • conversion: whether a conversion occured for this user (binary, label)
    • visit: whether a visit occured for this user (binary, label)
    • exposure: treatment effect, whether the user has been effectively exposed (binary)

    ###

    Starter Kernels

    Acknowledgement

    The data provided for paper: "A Large Scale Benchmark for Uplift Modeling"

    https://s3.us-east-2.amazonaws.com/criteo-uplift-dataset/large-scale-benchmark.pdf

    • Eustache Diemert CAIL e.diemert@criteo.com
    • Artem Betlei CAIL & Université Grenoble Alpes a.betlei@criteo.com
    • Christophe Renaudin CAIL c.renaudin@criteo.com
    • Massih-Reza Amini Université Grenoble Alpes massih-reza.amini@imag.fr

    For privacy reasons the data has been sub-sampled non-uniformly so that the original incrementality level cannot be deduced from the dataset while preserving a realistic, challenging benchmark. Feature names have been anonymized and their values randomly projected so as to keep predictive power while making it practically impossible to recover the original features or user context.

    Inspiration

    We can foresee related usages such as but not limited to:

    • Uplift modeling
    • Interactions between features and treatment
    • Heterogeneity of treatment

    More Readings

    MORE DATASETs ...

  17. R

    Invoice Management Dataset

    • universe.roboflow.com
    zip
    Updated Dec 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CVIP Workspace (2024). Invoice Management Dataset [Dataset]. https://universe.roboflow.com/cvip-workspace/invoice-management
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 28, 2024
    Dataset authored and provided by
    CVIP Workspace
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Text Bounding Boxes
    Description

    Intelligent Invoice Management System

    Project Description:
    The Intelligent Invoice Management System is an advanced AI-powered platform designed to revolutionize traditional invoice processing. By automating the extraction, validation, and management of invoice data, this system addresses the inefficiencies, inaccuracies, and high costs associated with manual methods. It enables businesses to streamline operations, reduce human error, and expedite payment cycles.

    Problem Statement:
    Manual invoice processing involves labor-intensive tasks such as data entry, verification, and reconciliation. These processes are time-consuming, prone to errors, and can result in financial losses and delays. The diversity of invoice formats from various vendors adds complexity, making automation a critical need for efficiency and scalability.

    Proposed Solution:
    The Intelligent Invoice Management System automates the end-to-end process of invoice handling using AI and machine learning techniques. Core functionalities include:
    1. Invoice Generation: Automatically generate PDF invoices in at least four formats, populated with synthetic data.
    2. Data Development: Leverage a dataset containing fields such as receipt numbers, company details, sales tax information, and itemized tables to create realistic invoice samples.
    3. AI-Powered Labeling: Use Tesseract OCR to extract labeled data from invoice images, and train YOLO for label recognition, ensuring precise identification of fields.
    4. Database Integration: Store extracted information in a structured database for seamless retrieval and analysis.
    5. Web-Based Information System: Provide a user-friendly platform to upload invoices and retrieve key metrics, such as:
    - Total sales within a specified duration.
    - Total sales tax paid during a given timeframe.
    - Detailed invoice information in tabular form for specific date ranges.

    Key Features and Deliverables:
    1. Invoice Generation:
    - Generate 20,000 invoices using an automated script.
    - Include dummy logos, company details, and itemized tables for four items per invoice.

    1. Label Definition and Format:

      • Define structured labels (TBLR, CLASS Name, Recognized Text).
      • Provide labels in both XML and JSON formats for seamless integration.
    2. OCR and AI Training:

      • Automate labeling using Tesseract OCR for high-accuracy text recognition.
      • Train and test YOLO to detect and classify invoice fields (TBLR and CLASS).
    3. Database Management:

      • Store OCR-extracted labels and field data in a database.
      • Enable efficient search and aggregation of invoice data.
    4. Web-Based Interface:

      • Build a responsive system for users to upload invoices and retrieve data based on company name or NTN.
      • Display metrics and reports for total sales, tax paid, and invoice details over custom date ranges.

    Expected Outcomes: - Reduction in manual effort and operational costs.
    - Improved accuracy in invoice processing and financial reporting.
    - Enhanced scalability and adaptability for diverse invoice formats.
    - Faster turnaround time for invoice-related tasks.

    By automating critical aspects of invoice management, this system delivers a robust and intelligent solution to meet the evolving needs of businesses.

  18. EMG from Combination Gestures with Ground-truth Joystick Labels

    • zenodo.org
    bin, zip
    Updated Jan 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Niklas Smedemark-Margulies; Yunus Bicer; Elifnur Sunger; Stephanie Naufel; Tales Imbiriba; Eugene Tunik; Deniz Erdogmus; Mathew Yarossi; Niklas Smedemark-Margulies; Yunus Bicer; Elifnur Sunger; Stephanie Naufel; Tales Imbiriba; Eugene Tunik; Deniz Erdogmus; Mathew Yarossi (2024). EMG from Combination Gestures with Ground-truth Joystick Labels [Dataset]. http://doi.org/10.5281/zenodo.10393194
    Explore at:
    bin, zipAvailable download formats
    Dataset updated
    Jan 4, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Niklas Smedemark-Margulies; Yunus Bicer; Elifnur Sunger; Stephanie Naufel; Tales Imbiriba; Eugene Tunik; Deniz Erdogmus; Mathew Yarossi; Niklas Smedemark-Margulies; Yunus Bicer; Elifnur Sunger; Stephanie Naufel; Tales Imbiriba; Eugene Tunik; Deniz Erdogmus; Mathew Yarossi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset of surface EMG recordings from 11 subjects performing single and combination gestures, from "**A Multi-label Classification Approach to Increase Expressivity of EMG-based Gesture Recognition**" by Niklas Smedemark-Margulies, Yunus Bicer, Elifnur Sunger, Stephanie Naufel, Tales Imbiriba, Eugene Tunik, Deniz Erdogmus, and Mathew Yarossi.

    For more details and example usage, see the following:

    Contents

    Dataset of single and combination gestures from 11 subjects.
    Subjects participated in 13 experimental blocks.
    During each block, they followed visual prompts to perform gestures while also manipulating a joystick.
    Surface EMG was recorded from 8 electrodes on the forearm; labels were recorded according to the current visual prompt and the current state of the joystick.

    Experiments included the following blocks:

    • 1 Calibration block
    • 6 Simultaneous-Pulse Combination blocks (3 without feedback, 3 with feedback)
    • 6 Hold-Pulse Combination blocks (3 without feedback, 3 with feedback)

    The contents of each block type were as follows:

    • In the Calibration block, subjects performed 8 repetitions of each of the 4 direction gestures, 2 modifier gestures, and a resting pose.
      Each Calibration trial provided 160 overlapping examples, for a total of: 8 repetitions x 7 gestures x 160 examples = 8960 examples.
    • In Simultaneous-Pulse Combination blocks, subjects performed 8 trials of combination gestures, where both components were performed simultaneously.
      Each Simultaneous-Pulse trial provided 240 overlapping examples, for a total of: 8 trials x 240 examples = 1920 examples.
    • In Hold-Pulse Combination blocks, subjects performed 28 trials of combination gestures, where 1 gesture component was held while the other was pulsed.
      Each Hold-Pulse trial provided 240 overlapping examples, for a total of: 28 trials x 240 examples = 6720 examples.

    A single data example (from any block) corresponds a window 250ms of EMG recorded at 1926Hz (built-in 20–450 Hz bandpass filtering applied).
    A 50ms step size was used between each window; note that neighboring data examples are therefore overlapping.

    Feedback was provided as follows:

    • In blocks with feedback, a model pre-trained on the Calibration data was used to give realtime visual feedback during the trial.
    • In blocks without feedback, no model was used, and the visual prompt was the only source of information about the current gesture.

    For more details, see the paper.

    Labels

    Two types of labels are provided:

    • joystick labels were recorded based on the position of the joystick, and are treated as ground-truth.
    • visual labels were also recorded based on what prompt was currently being shown to the subject.

    For both joystick and visual labels, the following structure applies. Each gesture trial has a two-part label.

    The first label component describes the direction gesture, and takes values in {0, 1, 2, 3, 4}, with the following meaning:

    • 0 - "Up" (joystick pull)
    • 1 - "Down" (joystick push)
    • 2 - "Left" (joystick left)
    • 3 - "Right" (joystick right)
    • 4 - "NoDirection" (absence of a direction gesture; none of the above)

    The second label component describes the modifier gesture, and takes values in {0, 1, 2}, with the following meaning:

    • 0 - "Pinch" (joystick trigger button)
    • 1 - "Thumb" (joystick thumb button)
    • 2 - "NoModifier" (absence of a modifier gesture; none of the above)

    Examples of Label Structure

    Single gestures have labels like (0, 2) indicating ("Up", "NoModifier") or (4, 1) indicating ("NoDirection", "Thumb").

    Combination gesture have labels like (0, 0) indicating ("Up", "Pinch") or (2, 1) indicating ("Left", "Thumb").

    File layout

    Data are provided in Numpy and MATLAB format. Descriptions below apply for both.

    Each experimental block is provided in a separate folder.
    Within one experimental block, the following files are provided:

    • `data.npy` - Raw EMG data, with shape (items, channels, timesteps).
    • `joystick_direction_labels.npy` - one-hot joystick direction labels, with shape (items, 5).
    • `joystick_modifier_labels.npy` - one-hot joystick modifier labels, with shape (items, 3).
    • `visual_direction_labels.npy` - one-hot visual direction labels, with shape (items, 5).
    • `visual_modifier_labels.npy` - one-hot visual modifier labels, with shape (items, 3).

    Loading data

    For example code snippets for loading data, see the associated code repository.

  19. Urban Sound & Sight (Urbansas) - Labeled set

    • zenodo.org
    • explore.openaire.eu
    txt, zip
    Updated Jun 20, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Magdalena Fuentes; Bea Steers; Pablo Zinemanas; Martín Rocamora; Luca Bondi; Julia Wilkins; Qianyi Shi; Yao Hou; Samarjit Das; Xavier Serra; Juan Pablo Bello; Magdalena Fuentes; Bea Steers; Pablo Zinemanas; Martín Rocamora; Luca Bondi; Julia Wilkins; Qianyi Shi; Yao Hou; Samarjit Das; Xavier Serra; Juan Pablo Bello (2022). Urban Sound & Sight (Urbansas) - Labeled set [Dataset]. http://doi.org/10.5281/zenodo.6658386
    Explore at:
    txt, zipAvailable download formats
    Dataset updated
    Jun 20, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Magdalena Fuentes; Bea Steers; Pablo Zinemanas; Martín Rocamora; Luca Bondi; Julia Wilkins; Qianyi Shi; Yao Hou; Samarjit Das; Xavier Serra; Juan Pablo Bello; Magdalena Fuentes; Bea Steers; Pablo Zinemanas; Martín Rocamora; Luca Bondi; Julia Wilkins; Qianyi Shi; Yao Hou; Samarjit Das; Xavier Serra; Juan Pablo Bello
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Urban Sound & Sight (Urbansas):

    Version 1.0, May 2022

    Created by
    Magdalena Fuentes (1, 2), Bea Steers (1, 2), Pablo Zinemanas (3), Martín Rocamora (4), Luca Bondi (5), Julia Wilkins (1, 2), Qianyi Shi (2), Yao Hou (2), Samarjit Das (5), Xavier Serra (3), Juan Pablo Bello (1, 2)
    1. Music and Audio Research Lab, New York University
    2. Center for Urban Science and Progress, New York University
    3. Universitat Pompeu Fabra, Barcelona, Spain
    4. Universidad de la República, Montevideo, Uruguay
    5. Bosch Research, Pittsburgh, PA, USA

    Publication

    If using this data in academic work, please cite the following paper, which presented this dataset:
    M. Fuentes, B. Steers, P. Zinemanas, M. Rocamora, L. Bondi, J. Wilkins, Q. Shi, Y. Hou, S. Das, X. Serra, J. Bello. “Urban Sound & Sight: Dataset and Benchmark for Audio-Visual Urban Scene Understanding”. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022.

    Description

    Urbansas is a dataset for the development and evaluation of machine listening systems for audiovisual spatial urban understanding. One of the main challenges to this field of study is a lack of realistic, labeled data to train and evaluate models on their ability to localize using a combination of audio and video.
    We set four main goals for creating this dataset:
    1. To compile a set of real-field audio-visual recordings;
    2. The recordings should be stereo to allow exploring sound localization in the wild;
    3. The compilation should be varied in terms of scenes and recording conditions to be meaningful for training and evaluation of machine learning models;
    4. The labeled collection should be accompanied by a bigger unlabeled collection with similar characteristics to allow exploring self-supervised learning in urban contexts.
    Audiovisual data
    We have compiled and manually annotated Urbansas from two publicly available datasets, plus the addition of unreleased material. The public datasets are the TAU Urban Audio-Visual Scenes 2021 Development dataset (street-traffic subset) and the Montevideo Audio-Visual Dataset (MAVD):


    Wang, Shanshan, et al. "A curated dataset of urban scenes for audio-visual scene analysis." ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021.

    Zinemanas, Pablo, Pablo Cancela, and Martín Rocamora. "MAVD: A dataset for sound event detection in urban environments." Detection and Classification of Acoustic Scenes and Events, DCASE 2019, New York, NY, USA, 25–26 oct, page 263--267 (2019).


    The TAU dataset consists of 10-second segments of audio and video from different scenes across European cities, traffic being one of the scenes. Only the scenes labeled as traffic were included in Urbansas. MAVD is an audio-visual traffic dataset curated in different locations of Montevideo, Uruguay, with annotations of vehicles and vehicle components sounds (e.g. engine, brakes) for sound event detection. Besides the published datasets, we include a total of 9.5 hours of unpublished material recorded in Montevideo, with the same recording devices of MAVD but including new locations and scenes.

    Recordings for TAU were acquired using a GoPro Hero 5 (30fps, 1280x720) and a Soundman OKM II Klassik/studio A3 electret binaural in-ear microphone with a Zoom F8 audio recorder (48kHz, 24 bits, stereo). Recordings for MAVD were collected using a GoPro Hero 3 (24fps, 1920x1080) and a SONY PCM-D50 recorder (48kHz, 24 bits, stereo).

    When compiled in Urbansas, it includes 15 hours of stereo audio and video, stored in separate 10 second MPEG4 (1280x720, 24fps) and WAV (48kHz, 24 bit, 2 channel) files. Both released video datasets are already anonymized to obscure people and license plates, the unpublished MAVD data was anonymized similarly using this anonymizer. We also distribute the 2fps video used for producing the annotations.

    The audio and video files both share the same filename stem, meaning that they can be associated after removing the parent directory and extension.

    MAVD:
    video/

    TAU:
    video/


    where location_id in both cases includes the city and an ID number.


    city & places & clips & mins & frames & labeled mins \\
    Montevideo & 8 & 4085 & 681 & 980400 & 92 \\
    Stockholm & 3 & 91 & 15 & 21840 & 2 \\
    Barcelona & 4 & 144 & 24 & 34560 & 24 \\
    Helsinki & 4 & 144 & 24 & 34560 & 16 \\
    Lisbon & 4 & 144 & 24 & 34560 & 19 \\
    Lyon & 4 & 144 & 24 & 34560 & 6 \\
    Paris & 4 & 144 & 24 & 34560 & 2 \\
    Prague & 4 & 144 & 24 & 34560 & 2 \\
    Vienna & 4 & 144 & 24 & 34560 & 6 \\
    London & 5 & 144 & 24 & 34560 & 4 \\
    Milan & 6 & 144 & 24 & 34560 & 6 \\
    \midrule
    Total & 50 & 5472 & 912 & 1.3M & 180 \\


    Annotations


    Of the 15 hours of audio and video, 3 hours of data (1.5 hours TAU, 1.5 hours MAVD) are manually annotated by our team both in audio and image, along with 12 hours of unlabeled data (2.5 hours TAU, 9.5 hours of unpublished material) for the benefit of unsupervised models. The distribution of clips across locations was selected to maximize variance across different scenes. The annotations were collected at 2 frames per second (FPS) as it provided a balance between temporal granularity and clip coverage.

    The annotation data is contained in video_annotations.csv and audio_annotations.csv.

    Video Annotations

    Each row in the video annotations represents a single object in a single frame of the video. The annotation schema is as follows:

    • frame_id: The index of the frame within the clip the annotation is associated with. This index is 0-based and goes up to 19 (assuming 10-second clips with annotations at 2 FPS)
    • track_id: The ID of the detected instance that identifies the same object across different frames. These IDs are guaranteed to be unique within a clip.
    • x, y, w, h: The top-left corner and width and height of the object’s bounding box in the video. The values are given in absolute coordinates with respect to the image size (1280x720).
    • class_id: The index of the class corresponding to: [0, 1, 2, 3, -1] — see label for the index mapping. The -1 value corresponds to the case where there are no events, but still clip-level annotations, like night and city. When operating on bounding boxes, class_id of -1 should be filtered.
    • label: The label text. This is equivalent to LABELS[class_id], where LABELS=[car, bus, motorbike, truck, -1]. The label -1 has the same role as above.
    • visibility: The visibility of the object. This is 1 unless the object becomes obstructed, where it changes to 0.
    • filename: The file ID of the associated file. This is the file’s path minus the parent directory and extension.
    • city: The city where the clip was collected in.
    • location_id: The specific name of the location. This may include an integer ID following the city name for cases where there are multiple collection points.
    • time: The time (in seconds) of the annotation, relative to the start of the file. Equivalent to frame_id / fps .
    • night: Whether the clip takes place during the day or at night. This value is singular per clip.
    • subset: Which data source the data originally belongs to (TAU or MAVD).

    Audio Annotations

    Each row represents a single object instance, along with the time range that it exists within the clip. The annotation schema is as follows:

    • filename: The file ID odd the associated audio file. See filename above.
    • class_id, label: See above. Audio has an additional class_id of 4 (label=offscreen) which indicates an off-screen vehicle - meaning a vehicle that is heard but not seen. A class_id of -1 indicates a clip-level annotation for a clip that has no object annotations (an empty scene).
    • non_identifiable_vehicle_sound: True if the region contains the sound of vehicles where individual instances cannot be uniquely identified.
    • start, end: The start and end times (in seconds) of the annotation relative to the file.

    Conditions of use

    Dataset created by Magdalena Fuentes, Bea Steers, Pablo Zinemanas, Martín Rocamora, Luca Bondi, Julia Wilkins, Qianyi Shi, Yao Hou, Samarjit Das, Xavier Serra, and Juan Pablo Bello.

    The Urbansas dataset is offered free of charge under the following terms:

    • Urbansas annotations are release under the CC BY 4.0 license
    • Urbansas video and audio replicates the original sources licenses:
      • MAVD subset is released under CC BY 4.0
      • TAU subset is released under a Non-Commercial license

    Feedback

    Please help us improve Urbansas by sending your feedback to:

    • Magdalena Fuentes: mfuentes@nyu.edu
    • Bea Steers: bsteers@nyu.edu

    In case of a problem, please include as many details as possible.

    Acknowledgments

    This work was partially supported by the National Science

  20. M

    Fish Detection AI, Optic and Sonar-trained Object Detection Models

    • mhkdr.openei.org
    • data.openei.org
    • +1more
    archive +2
    Updated Jun 25, 2014
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Katherine Slater; Delano Yoder; Carlos Noyes; Brett Scott; Katherine Slater; Delano Yoder; Carlos Noyes; Brett Scott (2014). Fish Detection AI, Optic and Sonar-trained Object Detection Models [Dataset]. https://mhkdr.openei.org/submissions/600
    Explore at:
    website, archive, text_documentAvailable download formats
    Dataset updated
    Jun 25, 2014
    Dataset provided by
    USDOE Office of Energy Efficiency and Renewable Energy (EERE), Renewable Power Office. Water Power Technologies Office (EE-4WP)
    Marine and Hydrokinetic Data Repository
    Water Power Technology Office
    Authors
    Katherine Slater; Delano Yoder; Carlos Noyes; Brett Scott; Katherine Slater; Delano Yoder; Carlos Noyes; Brett Scott
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Fish Detection AI project aims to improve the efficiency of fish monitoring around marine energy facilities to comply with regulatory requirements. Despite advancements in computer vision, there is limited focus on sonar images, identifying small fish with unlabeled data, and methods for underwater fish monitoring for marine energy.

    A YOLO (You Only Look Once) computer vision model was developed using the Eyesea dataset (optical) and sonar images from Alaska Fish and Games to identify fish in underwater environments. Supervised methods were used within YOLO to detect fish based on training using labeled data of fish. These trained models were then applied to different unseen datasets, aiming to reduce the need for labeling datasets and training new models for various locations. Additionally, hyper-image analysis and various image preprocessing methods were explored to enhance fish detection.

    In this research we achieved: 1. Enhanced YOLO Performance, as compared to a published article (Xu, Matzner 2018) using earlier yolo versions for fish object identification. Specifically, we achieved a best mean Average Precision (mAP) of 0.68 on the Eyesea optical dataset using YOLO v8 (medium-sized model), surpassing previous YOLO v3 benchmarks from that previous article publication. We further demonstrated up to 0.65 mAP on unseen sonar domains by leveraging a hyper-image approach (stacking consecutive frames), showing promising cross-domain adaptability.

    This submission of data includes: - The actual best-performing trained YOLO model neural network weights, which can be applied to do object detection (PyTorch files, .pt). These are found in the Yolo_models_downloaded zip file - Documentation file to explain the upload and the goals of each of the experiments 1-5, as detailed in the word document (named "Yolo_Object_Detection_How_To_Document.docx") - Coding files, namely 5 sub-folders of python, shell, and yaml files that were used to run the experiments 1-5, as well as a separate folder for yolo models. Each of these is found in their own zip file, named after each experiment - Sample data structures (sample1 and sample2, each with their own zip file) to show how the raw data should be structured after running our provided code on the raw downloaded data - link to the article that we were replicating (Xu, Matzner 2018) - link to the Yolo documentation site from the original creators of that model (ultralytics) - link to the downloadable EyeSea data set from PNNL (instructions on how to download and format the data in the right way to be able to replicate these experiments is found in the How To word document)

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Data Insights Market (2025). Label Classifier Report [Dataset]. https://www.datainsightsmarket.com/reports/label-classifier-504593

Label Classifier Report

Explore at:
doc, ppt, pdfAvailable download formats
Dataset updated
May 31, 2025
Dataset authored and provided by
Data Insights Market
License

https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description

The Label Classifier market is experiencing robust growth, driven by the increasing adoption of machine learning and artificial intelligence across diverse sectors. The market's expansion is fueled by the need for efficient and accurate data annotation and classification in applications ranging from image recognition and natural language processing to medical diagnosis and fraud detection. The rising volume of unstructured data and the need for automated data analysis are key catalysts for this growth. While precise market sizing data wasn't provided, considering the involvement of major tech players like Google, Microsoft, and Amazon, along with specialized AI companies, a reasonable estimate for the 2025 market size could be in the range of $500 million to $1 billion, depending on the specific definition of "Label Classifier" and the inclusion of related technologies. A Compound Annual Growth Rate (CAGR) of 25-30% over the forecast period (2025-2033) seems realistic given the current technological advancements and market demand. This growth is anticipated to continue, fueled by several factors. Advancements in deep learning algorithms, improved computational power, and the availability of larger datasets are enhancing the accuracy and efficiency of label classifiers. Furthermore, the increasing demand for automation in various industries, coupled with the growing need for real-time insights from data, will propel the market forward. However, challenges such as data security concerns, the need for skilled professionals to develop and maintain these systems, and the high computational costs associated with complex label classifiers could potentially act as restraints. The market is segmented based on deployment (cloud, on-premise), application (image recognition, text analysis, etc.), and industry (healthcare, finance, etc.). Key players are actively investing in research and development, expanding their product portfolios, and forging strategic partnerships to maintain a competitive edge in this rapidly evolving market. The competitive landscape is dynamic, with both established tech giants and specialized AI startups vying for market share.

Search
Clear search
Close search
Google apps
Main menu