97 datasets found

L
Label Classifier Report
datainsightsmarket.com
doc, pdf, ppt
Updated May 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Label Classifier Report [Dataset]. https://www.datainsightsmarket.com/reports/label-classifier-504593
Explore at:
doc, ppt, pdfAvailable download formats
Dataset updated
May 31, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The Label Classifier market is experiencing robust growth, driven by the increasing adoption of machine learning and artificial intelligence across diverse sectors. The market's expansion is fueled by the need for efficient and accurate data annotation and classification in applications ranging from image recognition and natural language processing to medical diagnosis and fraud detection. The rising volume of unstructured data and the need for automated data analysis are key catalysts for this growth. While precise market sizing data wasn't provided, considering the involvement of major tech players like Google, Microsoft, and Amazon, along with specialized AI companies, a reasonable estimate for the 2025 market size could be in the range of $500 million to $1 billion, depending on the specific definition of "Label Classifier" and the inclusion of related technologies. A Compound Annual Growth Rate (CAGR) of 25-30% over the forecast period (2025-2033) seems realistic given the current technological advancements and market demand. This growth is anticipated to continue, fueled by several factors. Advancements in deep learning algorithms, improved computational power, and the availability of larger datasets are enhancing the accuracy and efficiency of label classifiers. Furthermore, the increasing demand for automation in various industries, coupled with the growing need for real-time insights from data, will propel the market forward. However, challenges such as data security concerns, the need for skilled professionals to develop and maintain these systems, and the high computational costs associated with complex label classifiers could potentially act as restraints. The market is segmented based on deployment (cloud, on-premise), application (image recognition, text analysis, etc.), and industry (healthcare, finance, etc.). Key players are actively investing in research and development, expanding their product portfolios, and forging strategic partnerships to maintain a competitive edge in this rapidly evolving market. The competitive landscape is dynamic, with both established tech giants and specialized AI startups vying for market share.
FDA Online Label Repository
catalog.data.gov
healthdata.gov
+5more
Updated Jul 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Food and Drug Administration (2025). FDA Online Label Repository [Dataset]. https://catalog.data.gov/dataset/fda-online-label-repository
Explore at:
Dataset updated
Jul 11, 2025
Dataset provided by
Food and Drug Administrationhttp://www.fda.gov/
Description
The drug labels and other drug-specific information on this Web site represent the most recent drug listing information companies have submitted to the Food and Drug Administration (FDA). (See 21 CFR part 207.) The drug labeling and other information has been reformatted to make it easier to read but its content has neither been altered nor verified by FDA. The drug labeling on this Web site may not be the labeling on currently distributed products or identical to the labeling that is approved. Most OTC drugs are not reviewed and approved by FDA, however they may be marketed if they comply with applicable regulations and policies described in monographs. Drugs marked 'OTC monograph final' or OTC monograph not final' are not checked for conformance to the monograph. Drugs marked 'unapproved medical gas', 'unapproved homeopathic' or 'unapproved drug other' on this Web site have not been evaluated by FDA for safety and efficacy and their labeling has not been approved. In addition, FDA is not aware of scientific evidence to support homeopathy as effective.
o
FSDnoisy18k
explore.openaire.eu
opendatalab.com
+3more
Updated Jan 3, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eduardo Fonseca; Mercedes Collado; Manoj Plakal; Daniel P. W. Daniel P. W. Ellis; Frederic Font; Xavier Favory; Xavier Serra (2019). FSDnoisy18k [Dataset]. http://doi.org/10.5281/zenodo.2529933
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.2529933
Dataset updated
Jan 3, 2019
Authors
Eduardo Fonseca; Mercedes Collado; Manoj Plakal; Daniel P. W. Daniel P. W. Ellis; Frederic Font; Xavier Favory; Xavier Serra
Description
FSDnoisy18k is an audio dataset collected with the aim of fostering the investigation of label noise in sound event classification. It contains 42.5 hours of audio across 20 sound classes, including a small amount of manually-labeled data and a larger quantity of real-world noisy data. Data curators Eduardo Fonseca and Mercedes Collado Contact You are welcome to contact Eduardo Fonseca should you have any questions at eduardo.fonseca@upf.edu. Citation If you use this dataset or part of it, please cite the following ICASSP 2019 paper: Eduardo Fonseca, Manoj Plakal, Daniel P. W. Ellis, Frederic Font, Xavier Favory, and Xavier Serra, “Learning Sound Event Classifiers from Web Audio with Noisy Labels”, arXiv preprint arXiv:1901.01189, 2019 You can also consider citing our ISMIR 2017 paper that describes the Freesound Annotator, which was used to gather the manual annotations included in FSDnoisy18k: Eduardo Fonseca, Jordi Pons, Xavier Favory, Frederic Font, Dmitry Bogdanov, Andres Ferraro, Sergio Oramas, Alastair Porter, and Xavier Serra, “Freesound Datasets: A Platform for the Creation of Open Audio Datasets”, In Proceedings of the 18th International Society for Music Information Retrieval Conference, Suzhou, China, 2017 FSDnoisy18k description What follows is a summary of the most basic aspects of FSDnoisy18k. For a complete description of FSDnoisy18k, make sure to check: the FSDnoisy18k companion site: http://www.eduardofonseca.net/FSDnoisy18k/ the description provided in Section 2 of our ICASSP 2019 paper FSDnoisy18k is an audio dataset collected with the aim of fostering the investigation of label noise in sound event classification. It contains 42.5 hours of audio across 20 sound classes, including a small amount of manually-labeled data and a larger quantity of real-world noisy data. The source of audio content is Freesound—a sound sharing site created an maintained by the Music Technology Group hosting over 400,000 clips uploaded by its community of users, who additionally provide some basic metadata (e.g., tags, and title). The 20 classes of FSDnoisy18k are drawn from the AudioSet Ontology and are selected based on data availability as well as on their suitability to allow the study of label noise. The 20 classes are: "Acoustic guitar", "Bass guitar", "Clapping", "Coin (dropping)", "Crash cymbal", "Dishes, pots, and pans", "Engine", "Fart", "Fire", "Fireworks", "Glass", "Hi-hat", "Piano", "Rain", "Slam", "Squeak", "Tearing", "Walk, footsteps", "Wind", and "Writing". FSDnoisy18k was created with the Freesound Annotator, which is a platform for the collaborative creation of open audio datasets. We defined a clean portion of the dataset consisting of correct and complete labels. The remaining portion is referred to as the noisy portion. Each clip in the dataset has a single ground truth label (singly-labeled data). The clean portion of the data consists of audio clips whose labels are rated as present in the clip and predominant (almost all with full inter-annotator agreement), meaning that the label is correct and, in most cases, there is no additional acoustic material other than the labeled class. A few clips may contain some additional sound events, but they occur in the background and do not belong to any of the 20 target classes. This is more common for some classes that rarely occur alone, e.g., “Fire”, “Glass”, “Wind” or “Walk, footsteps”. The noisy portion of the data consists of audio clips that received no human validation. In this case, they are categorized on the basis of the user-provided tags in Freesound. Hence, the noisy portion features a certain amount of label noise. Code We've released the code for our ICASSP 2019 paper at https://github.com/edufonseca/icassp19. The framework comprises all the basic stages: feature extraction, training, inference and evaluation. After loading the FSDnoisy18k dataset, log-mel energies are computed and a CNN baseline is trained and evaluated. The code also allows to test four noise-robust loss functions. Please check our paper for more details. Label noise characteristics FSDnoisy18k features real label noise that is representative of audio data retrieved from the web, particularly from Freesound. The analysis of a per-class, random, 15% of the noisy portion of FSDnoisy18k revealed that roughly 40% of the analyzed labels are correct and complete, whereas 60% of the labels show some type of label noise. Please check the FSDnoisy18k companion site for a detailed characterization of the label noise in the dataset, including a taxonomy of label noise for singly-labeled data as well as a per-class description of the label noise. FSDnoisy18k basic characteristics The dataset most relevant characteristics are as follows: FSDnoisy18k contains 18,532 audio clips (42.5h) unequally distributed in the 20 aforementioned classes drawn from the AudioSet Ontology. The audio clips are provided as uncompressed PCM 16 bit, 44.1 kHz, mono audio...

In Mold Labelling Market Analysis, Size, and Forecast 2025-2029: Europe...

technavio.com

Facebook

Twitter

Click to copy link

Link copied

Cite

Technavio, In Mold Labelling Market Analysis, Size, and Forecast 2025-2029: Europe (France, Germany, Italy, Spain, UK), North America (US and Canada), APAC (China, Japan), Middle East and Africa , and South America (Brazil) [Dataset]. https://www.technavio.com/report/in-mold-labelling-market-industry-analysis

Explore at:

Dataset provided by

TechNavio

Authors

Technavio

Time period covered

2021 - 2025

Area covered

Canada, United States, Germany, Global

Description

Snapshot img

In Mold Labelling Market Size 2025-2029

The in mold labelling market size is forecast to increase by USD 1.05 billion at a CAGR of 5.1% between 2024 and 2029.

The In Mold Labelling (IML) market is experiencing significant growth due to the increasing global manufacturing output, particularly in sectors such as automotive, packaging, and electronics. IML technology offers several advantages, including improved product aesthetics, reduced material usage, and enhanced branding capabilities. However, the high initial investments required for IML equipment and tooling can act as a barrier to entry for some companies. Key market trends include the increasing adoption of digital technologies, such as 3D design and simulation software, to optimize the IML design process. Additionally, the growing demand for sustainable labeling solutions is driving innovation in the market, with biodegradable and recyclable IML materials gaining popularity. The in mold labelling (IML) market is experiencing significant growth due to the increasing production output in various industries, particularly In the spheres of spa, frozen food, packaging, personal care, and cosmetics.
Companies seeking to capitalize on these opportunities must stay abreast of technological advancements and market trends while navigating the challenges of high upfront costs and regulatory compliance. By investing in research and development and forming strategic partnerships, companies can differentiate themselves in the competitive IML market and secure a strong market position.

What will be the Size of the In Mold Labelling Market during the forecast period?

Request Free Sample

The in mold labeling market in the United States is experiencing significant growth, driven by the increasing demand for labeling solutions that offer superior durability, resistance, and sustainability. Key market dynamics include labeling data management for efficient production and supply chain tracking, labeling recycling and waste reduction, and labeling traceability for enhanced product safety and regulatory compliance. Chemical-resistant labels, label resistance, and labeling automation software are critical trends, enabling manufacturers to streamline processes and reduce production costs. Labeling system integration, labeling industry leaders, and high-definition printing are also driving innovation, with advancements in label durability testing, holographic labels, glossy labels, UV curing, heat-resistant labels, label peelability, scratch-resistant labels, waterproof labels, and labeling market forecasts. IML utilizes polypropylene as the label material, enabling multi-colored prints and intricate designs.
Functional labels, such as tactile, embossed, and matte labels, are gaining popularity due to their aesthetic appeal and added functionality. Decorative labels, metallic labels, and embossed labels are also increasingly being used for brand differentiation and consumer appeal. Labeling market analysis indicates continued growth, with a focus on labeling sustainability assessment, label removal, and labeling upcycling. The market is expected to remain competitive, with ongoing innovation trends in labeling technology and certification standards. Overall, the in mold labeling market is a dynamic and evolving industry, responding to the changing needs of consumers and businesses alike. Eco-friendly options and automation are also driving the growth of the IML market, ensuring its continued relevance as a branding tool in today's competitive business landscape.

How is this In Mold Labelling Industry segmented?

The in mold labelling industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments.

Technology

  Injection molding
  Blow molding
  Thermoforming


End-user

  Food and beverage
  Cosmetics
  Pharmaceuticals
  Others


Material

  Polypropylene
  Polyethylene
  Polyvinyl chloride
  Acrylonitrile butadiene styrene
  Others


Geography

  Europe

    France
    Germany
    Italy
    Spain
    UK


  North America

    US
    Canada


  APAC

    China
    Japan


  Middle East and Africa



  South America

    Brazil

By Technology Insights

The injection molding segment is estimated to witness significant growth during the forecast period. The in mold labeling market experiences significant growth due to various factors. One of these factors is the increasing demand for labeling in various industries, including healthcare, packaging, automobile, consumer goods, and electronics. Injection molding machines, a crucial component in the in mold labeling process, are in high demand due to their versatility and efficiency. These machines, consisting of an injection unit and a clamping unit, enable th

P
Data from: ImageNet Dataset
paperswithcode.com
Updated Feb 2, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jia Deng; Wei Dong; Richard Socher; Li-Jia Li; Kai Li; Fei-Fei Li (2021). ImageNet Dataset [Dataset]. https://paperswithcode.com/dataset/imagenet
Explore at:
Dataset updated
Feb 2, 2021
Authors
Jia Deng; Wei Dong; Richard Socher; Li-Jia Li; Kai Li; Fei-Fei Li
Description
The ImageNet dataset contains 14,197,122 annotated images according to the WordNet hierarchy. Since 2010 the dataset is used in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), a benchmark in image classification and object detection. The publicly released dataset contains a set of manually annotated training images. A set of test images is also released, with the manual annotations withheld. ILSVRC annotations fall into one of two categories: (1) image-level annotation of a binary label for the presence or absence of an object class in the image, e.g., “there are cars in this image” but “there are no tigers,” and (2) object-level annotation of a tight bounding box and class label around an object instance in the image, e.g., “there is a screwdriver centered at position (20,25) with width of 50 pixels and height of 30 pixels”. The ImageNet project does not own the copyright of the images, therefore only thumbnails and URLs of images are provided.

Total number of non-empty WordNet synsets: 21841 Total number of images: 14197122 Number of images with bounding box annotations: 1,034,908 Number of synsets with SIFT features: 1000 Number of images with SIFT features: 1.2 million
d
Data from: Processed Lab Data for Neural Network-Based Shear Stress Level...
catalog.data.gov
data.openei.org
+3more
Updated Jan 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pennsylvania State University (2025). Processed Lab Data for Neural Network-Based Shear Stress Level Prediction [Dataset]. https://catalog.data.gov/dataset/processed-lab-data-for-neural-network-based-shear-stress-level-prediction-309d2
Explore at:
Dataset updated
Jan 20, 2025
Dataset provided by
Pennsylvania State University
Description
Machine learning can be used to predict fault properties such as shear stress, friction, and time to failure using continuous records of fault zone acoustic emissions. The files are extracted features and labels from lab data (experiment p4679). The features are extracted with a non-overlapping window from the original acoustic data. The first column is the time of the window. The second and third columns are the mean and the variance of the acoustic data in this window, respectively. The 4th-11th column is the the power spectrum density ranging from low to high frequency. And the last column is the corresponding label (shear stress level). The name of the file means which driving velocity the sequence is generated from. Data were generated from laboratory friction experiments conducted with a biaxial shear apparatus. Experiments were conducted in the double direct shear configuration in which two fault zones are sheared between three rigid forcing blocks. Our samples consisted of two 5-mm-thick layers of simulated fault gouge with a nominal contact area of 10 by 10 cm^2. Gouge material consisted of soda-lime glass beads with initial particle size between 105 and 149 micrometers. Prior to shearing, we impose a constant fault normal stress of 2 MPa using a servo-controlled load-feedback mechanism and allow the sample to compact. Once the sample has reached a constant layer thickness, the central block is driven down at constant rate of 10 micrometers per second. In tandem, we collect an AE signal continuously at 4 MHz from a piezoceramic sensor embedded in a steel forcing block about 22 mm from the gouge layer The data from this experiment can be used with the deep learning algorithm to train it for future fault property prediction.
d
Replication Data for: Detecting Voter Understanding of Ideological Labels...
search.dataone.org
dataverse.harvard.edu
Updated Nov 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Miwa, Hirofumi; Arami, Reiko; Taniguchi, Masaki (2023). Replication Data for: Detecting Voter Understanding of Ideological Labels Using a Conjoint Experiment [Dataset]. http://doi.org/10.7910/DVN/FIHGN0
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/FIHGN0
Dataset updated
Nov 14, 2023
Dataset provided by
Harvard Dataverse
Authors
Miwa, Hirofumi; Arami, Reiko; Taniguchi, Masaki
Description
Understanding voters’ conception of ideological labels is critical for political behavioral research. Conventional research designs have several limitations, such as endogeneity, insufficient responses to open-ended questions, and inseparability of composite treatment effects. To address these challenges, we propose a conjoint experiment to study the meanings ascribed to ideological labels in terms of policy positions. We also suggest using a mixture model approach to explore heterogeneity in voters’ understandings of ideological labels, as well as the average interpretation of labels. We applied these approaches to conceptions of left–right labels in Japan, where the primary issue of elite-level conflicts has been distinctive compared with other developed countries. We found that, on average, while Japanese voters understand policy-related meanings of “left” and “right,” they primarily associate these labels with security and nationalism, and, secondarily, with social issues; they do not associate these labels with economic issues. Voters’ understandings partly depend on their birth cohort, but observed patterns do not necessarily coincide with what many researchers would predict regarding generational differences in Japanese politics. Mixture model results suggest that some individuals tend to associate left–right labels with security and nationalism policies, while others link them to social policies. Over one-third of respondents seemed to barely understand the usage of left–right labels in policy positions. Our study improves upon existing methods for measuring voter understanding of ideological labels, and reconfirm the global diversity of meanings associated with left–right labels.
R
Labeled Temporal Brain Networks
entrepot.recherche.data.gouv.fr
txt, zip
Updated Jul 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aurora ROSSI; Aurora ROSSI (2023). Labeled Temporal Brain Networks [Dataset]. http://doi.org/10.57745/HHNT10
Explore at:
txt(1498), zip(648811279)Available download formats
Unique identifier
https://doi.org/10.57745/HHNT10
Dataset updated
Jul 21, 2023
Dataset provided by
Recherche Data Gouv
Authors
Aurora ROSSI; Aurora ROSSI
License
https://entrepot.recherche.data.gouv.fr/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.57745/HHNT10https://entrepot.recherche.data.gouv.fr/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.57745/HHNT10
Dataset funded by
French government, National Research Agency (ANR)
Description
Labeled Temporal Brain Networks This dataset contains a collection of temporal brain networks of 100 subjects. Each subject has a label representing their biological sex ("M" for male and "F" for female) and age range (22-25, 26-30,31-35 and 36+). The networks are obtained from resting-state fMRI data from the Human Connectome Project (HCP) and are undirected and weighted. The number of nodes is fixed at 202, instead the edge weights change their values over time. Dataset structure The networks.zip file contains the networks as .txt files in the following format: the first line of each .txt file contains the number of nodes and the number of snapshots of the network divided by a space. The following lines contain the list of edges of the network in the form i,j,t,w, meaning that the edge between node i and node j at time t has weight w. The labels are contained in the file labels.txt, where there are three columns separated by a space, where the first column is the identifier of a subject, the second is the biological sex, and the last is an age range. Acknowledgments Data were provided by the Human Connectome Project, WU-Minn Consortium (Principal Investigators: David Van Essen and Kamil Ugurbil; 1U54MH091657) funded by the 16 NIH Institutes and Centers that support the NIH Blueprint for Neuroscience Research; and by the McDonnell Center for Systems Neuroscience at Washington University. The authors are grateful to the OPAL infrastructure from Université Côte d'Azur for providing resources and support. This work has been supported by the French government, through the UCA DS4H Investments in the Future project managed by the National Research Agency (ANR) with the reference number ANR-17-EURE-0004.
Lig-PCDB: Labeled Databases of X-ray Ligands Images in 3D Point Clouds and...
zenodo.org
Updated Apr 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cristina F Bazzano; Daniela B. B. Trivella; Guilherme P. Telles; Luiz G. Alves; Cristina F Bazzano; Daniela B. B. Trivella; Guilherme P. Telles; Luiz G. Alves (2025). Lig-PCDB: Labeled Databases of X-ray Ligands Images in 3D Point Clouds and Validated Deep Learning Models [Dataset]. http://doi.org/10.5281/zenodo.7872578
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.7872578
Dataset updated
Apr 24, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Cristina F Bazzano; Daniela B. B. Trivella; Guilherme P. Telles; Luiz G. Alves; Cristina F Bazzano; Daniela B. B. Trivella; Guilherme P. Telles; Luiz G. Alves
Description
LigPCDS: Labeled Dataset of X-ray Protein Ligand 3D Images in Point Clouds and Validated Deep Learning Models

The difference electron density from X-ray protein crystallography was used to create the first dataset of labeled images of ligands in 3D point clouds, named LigPCDS.

Four proposed vocabularies were validated by successfully training good performance deep learning models for the semantic segmentation of a stratified dataset from Lig-PCDB. The data from organic molecules (ligands) was obtained from the world Protein Data Bank with resolutions ranging from 1.5 to 2.2 Å. The ligands' images were interpolated from their calculated difference electron density map in a 3D grid-like bounding box, around their atomic positions, and stored in point clouds. A grid spacing of 0.5 Å gave the best results. The density value of the grid points was used as feature. The labeling approach used the structure of the ligands to propose vocabularies of chemical classes based on the chemical atoms themselves and their cyclic substructures. These annotations were applied pointwise to the ligands' images using an atomic sphere model. The databases and validated models may be used to tackle problems regarding known and unknown ligand building to drug discovery and fragment screening pipelines.

The four validated deep learning models are: (i) the LigandRegion, composed by generic atoms of any type; (ii) the AtomCycle, composed by generic atoms outside cycles and generic cycles; (iii) the AtomC347CA56, composed by generic atoms outside cycles, not aromatic cycles of size 3 to 7 and aromatic cycles of size 5 and 6; and (iv) the AtomSymbolGroups, composed by the atoms symbols with groupings. The mean accuracy of these models in their cross-validation was between 49.7% and 77.4% in terms of Intersection over Union (mIoU) metric and between 62.4% and 87.0% in F1-score (mF1).

The code used to create and validated the Lig-PCDB is available at the following repository: https://github.com/danielatrivella/np3_ligand

This repository also contains the NP³ Blob Label application for ligand building using the validated deep learning models from Lig-PCDB.

License

LigPCDS by Cristina Freitas Bazzano, Luiz G. Alves,Guilherme P. Telles, Daniela B. B. Trivella is marked with CC0 1.0 Universal .
n
ramp Building Footprint Dataset - N'Djamena, Chad
access.earthdata.nasa.gov
cmr.earthdata.nasa.gov
Updated Oct 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). ramp Building Footprint Dataset - N'Djamena, Chad [Dataset]. http://doi.org/10.34911/rdnt.b0noju
Explore at:
Unique identifier
https://doi.org/10.34911/rdnt.b0noju
Dataset updated
Oct 10, 2023
Time period covered
Jan 1, 2020 - Jan 1, 2023
Area covered

Description
This chipped training dataset is over N'Djamena and includes high-resolution imagery (.tif format) and corresponding building footprint vector labels (.geojson format) in 256 x 256 pixel tile/label pairs. This dataset is a ramp Tier 2 dataset, meaning it has NOT been thoroughly reviewed and improved. This dataset was produced for the ramp project and contains 3,044 tiles and 124,208 individual buildings. The satellite imagery resolution is 45 cm and was sourced from Maxar ODP (10300100AA405C00). Dataset keywords: Urban, Peri-urban, Rural
Z
Antisemitism on Twitter: A Dataset for Machine Learning and Text Analytics
data.niaid.nih.gov
zenodo.org
Updated Dec 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Soemer, Katharina (2024). Antisemitism on Twitter: A Dataset for Machine Learning and Text Analytics [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7872834
Explore at:
Dataset updated
Dec 13, 2024
Dataset provided by
Soemer, Katharina
Miehling, Daniel
Karali, Sameer
Jikeli, Gunther
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset from the Institute for the Study of Contemporary Antisemitism (ISCA) at Indiana University:

The Social Media & Hate research lab at the Institute for the Study of Contemporary Antisemitism compiled this dataset using an annotation portal (Jikeli, Soemer, and Karali 2024), which was used to label tweets as either antisemitic or non-antisemitic, among other labels. Note that annotation was done on live data, including images and context, such as threads. All data was annotated by two experts, and all discrepancies were discussed (Jikeli et al. 2023).

Content:

This dataset contains 11311 tweets covering a wide range of topics common in conversations about Jews, Israel, and antisemitism between January 2019 and April 2023. The dataset consists of random samples of relevant keywords during this time period. 1,953 tweets (17%) are antisemitic according to the IHRA definition of antisemitism.

The distribution of tweets by year is as follows: 1499 (13%) from 2019, 3712 (33%) from 2020, 2591 (23%) from 2021, 2644 from 2022 (23%) and 865 (8%) from 2023. 6365 (56%) contain the keyword "Jews," 4134 (37%) include "Israel," 529 (5%) feature the derogatory term "ZioNazi*," and 283 (3%) use the slur "K---s." Some tweets may contain multiple keywords.

725 out of the 6365 tweets with the keyword "Jews" (11%) and 664 out of the 4134 tweets with the keyword "Israel" (16%) were classified as antisemitic. 97 out of the 283 tweets using the antisemitic slur "K---s" (34%) are antisemitic. Interestingly, many tweets featuring the slur "K---s" actually call out its use. In contrast, the majority of tweets using the derogatory term "ZioNazi*" are antisemitic, with 467 out of 529 (88%) being classified as such.

File Description:

The dataset is provided in a csv file format, with each row representing a single message, including replies, quotes, and retweets. The file contains the following columns:

‘ID’: Represents the tweet ID.

‘Username’: Represents the username that posted the tweet.

‘Text’: Represents the full text of the tweet (not pre-processed).

‘CreateDate’: Represents the date on which the tweet was created.

‘Biased’: Represents the label given by our annotations as to whether the tweet is antisemitic or not.

‘Keyword’: Represents the keyword that was used in the query. The keyword can be in the text, including hashtags, mentioned users, or the username itself.

Licences

Data is published under the terms of the "Creative Commons Attribution 4.0 International" licence (https://creativecommons.org/licenses/by/4.0)

Acknowledgements

We are grateful for the support of Indiana University’s Observatory on Social Media (OSoMe) (Davis et al. 2016) and the contributions and annotations of all team members in our Social Media & Hate Research Lab at Indiana University’s Institute for the Study of Contemporary Antisemitism, especially Grace Bland, Elisha S. Breton, Kathryn Cooper, Robin Forstenhäusler, Sophie von Máriássy, Mabel Poindexter, Jenna Solomon, Clara Schilling, and Victor Tschiskale.

This work used Jetstream2 at Indiana University through allocation HUM200003 from the Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support (ACCESS) program, which is supported by National Science Foundation grants #2138259, #2138286, #2138307, #2137603, and #2138296.
f
Data from: Comment on the Definition and Labeling of pK50
acs.figshare.com
txt
Updated Aug 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mark A. Watson; Ryne C. Johnston; Art Bochevarov (2023). Comment on the Definition and Labeling of pK50 [Dataset]. http://doi.org/10.1021/acs.jcim.3c01210.s001
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.1021/acs.jcim.3c01210.s001
Dataset updated
Aug 21, 2023
Dataset provided by
ACS Publications
Authors
Mark A. Watson; Ryne C. Johnston; Art Bochevarov
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
We propose a more rigorous definition for the recently introduced concept of pK50. The value of pK50 should be associated not with a “functional group”, as originally postulated, but instead with an atom. The proposed clarification is meant to improve the interpretation and labeling of pK50.
n
ramp Building Footprint Dataset - Manjama, Sierra Leone
cmr.earthdata.nasa.gov
access.earthdata.nasa.gov
Updated Oct 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). ramp Building Footprint Dataset - Manjama, Sierra Leone [Dataset]. http://doi.org/10.34911/rdnt.fp33ih
Explore at:
Unique identifier
https://doi.org/10.34911/rdnt.fp33ih
Dataset updated
Oct 10, 2023
Time period covered
Jan 1, 2020 - Jan 1, 2023
Area covered

Description
This chipped training dataset is over Manjama and includes high-resolution imagery (.tif format) and corresponding building footprint vector labels (.geojson format) in 256 x 256 pixel tile/label pairs. This dataset is a ramp Tier 1 dataset, meaning it has been thoroughly reviewed and improved. This dataset was used in developing the ramp baseline model and contains 4,671 tiles and 60,379 individual buildings. The satellite imagery resolution is 30 cm and was sourced from Maxar ODP (1040010056B6FA00). Dataset keywords: Urban, Peri-Urban.
m
Bangla Multilabel Cyberbully, Sexual Harrasment, Threat and Spam Detection...
data.mendeley.com
Updated Jul 16, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Saieef Sunny (2024). Bangla Multilabel Cyberbully, Sexual Harrasment, Threat and Spam Detection Dataset [Dataset]. http://doi.org/10.17632/sz5558wrd4.3
Explore at:
Unique identifier
https://doi.org/10.17632/sz5558wrd4.3
Dataset updated
Jul 16, 2024
Authors
Saieef Sunny
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Overview The Bangla Multilabel Cyberbully, Sexual Harassment, Threat, and Spam Detection Dataset is designed to facilitate the development of machine learning models to detect and classify various types of abusive content in Bangla social media text. This dataset contains a collection of comments annotated for multiple types of abuse, making it suitable for multilabel classification tasks. It aims to support research and development in natural language processing (NLP) to enhance online safety and moderate harmful content on Bangla language social media platforms.

Purpose 1. Train and evaluate machine learning models for detection of cyberbullying, sexual harassment, religious hate speech, threats, and spam in Bangla comments. 2. Support research in NLP and machine learning focused on Bangla, a low-resource language. 3. Aid in developing automated moderation systems for social media platforms to ensure safe and respectful communication.

Data Collection Initially, we collected around 30,000 comments from social media platforms like Facebook and TikTok. These comments were in Bangla, English, and Banglish (Bangla written using English characters). Since our research focuses on Bangla abusive text detection, we refined the dataset through the following steps:

We filtered out all comments written in English to focus on the Bangla text.

To ensure data quality, We eliminated duplicate entries and rows with missing or null values.

We removed any remaining English characters and both Bangla and English numerical values to ensure the analysis was based solely on Bangla text.

After these steps, we obtained a final dataset of 12,557 comments. Each comment was manually labeled into five classes: bully, sexual, religious, threat, and spam. This dataset supports multi-class labeling, meaning a comment can simultaneously belong to more than one class.

Dataset Columns 1. Gender: Indicates the gender of the person who received the bullying. 2. Profession: Indicates the profession of the person who received the bullying. 3. Comment: Contains the text of the comment in Bangla. 4. Bully: Binary label indicating whether the comment contains bullying content. (0 for no, 1 for yes) 5. Sexual: Binary label indicating whether the comment contains sexual harassment content. (0 for no, 1 for yes) 6. Religious: Binary label indicating whether the comment contains religious hate speech. (0 for no, 1 for yes) 7. Threat: Binary label indicating whether the comment contains threats. (0 for no, 1 for yes) 8. Spam: Binary label indicating whether the comment is considered spam. (0 for no, 1 for yes)

Applications 1. Training and testing machine learning models for multilabel classification. 2. Research on natural language processing (NLP) and cyberbullying detection in low-resource languages like Bangla. 3. Developing automated systems for monitoring and moderating online content on social media platforms to ensure safe and respectful communication.
Blank Discs and Labels Market Report | Global Forecast From 2025 To 2033
dataintelo.com
csv, pdf, pptx
Updated Jan 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2025). Blank Discs and Labels Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-blank-discs-and-labels-market
Explore at:
csv, pdf, pptxAvailable download formats
Dataset updated
Jan 7, 2025
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
Blank Discs and Labels Market Outlook

The global blank discs and labels market size was valued at approximately USD 1.2 billion in 2023 and is forecasted to reach USD 1.8 billion by 2032, growing at a CAGR of 4.2% during the forecast period. The growth of this market is primarily driven by the increasing demand for physical data storage solutions, despite the rise of cloud storage. The versatility and ease of use associated with blank discs and labels continue to make them a preferred choice for many consumers and businesses.

One of the primary growth factors for the blank discs and labels market is the consistent demand for physical media storage solutions. Many industries, particularly those in specialized sectors such as healthcare, legal, and media production, continue to rely heavily on physical storage media to archive sensitive information and large data files. Moreover, the music and entertainment industry, while embracing digital distribution, still exhibits a significant demand for physical media due to the popularity of physical music albums and movie collections among enthusiasts and collectors.

Additionally, the educational sector is contributing to the market's growth. Educational institutions often require a reliable and cost-effective method for duplicating and distributing educational content, such as lectures, tutorials, and software programs. Blank discs and labels offer a tangible medium that students can easily access without needing an internet connection. This is particularly significant in regions where internet accessibility is limited or inconsistent, making physical media an essential tool in the educational process.

Another driving factor is the rise of small businesses and home-based entrepreneurs who utilize blank discs and printable labels for various purposes, including branding, marketing, and data storage. The availability of affordable and user-friendly disc-burning and label-printing technology has empowered these smaller entities to produce professional-looking products without incurring substantial costs. This trend is expected to continue, as more individuals and small businesses seek cost-effective and customizable solutions for their media and labeling needs.

Regionally, the blank discs and labels market sees a varied demand pattern. North America and Europe, being technologically advanced regions, have a substantial market share. However, the Asia Pacific region is emerging as a rapidly growing market due to the increasing adoption of digital media and the expansion of the educational sector. Moreover, the presence of numerous small and medium enterprises (SMEs) in the Asia Pacific region further fuels the demand for blank discs and labels for data storage and distribution needs.

The introduction of CD-R and CD-RW formats has significantly impacted the blank discs market, offering users the flexibility to choose between write-once and rewritable options. CD-Rs are often preferred for permanent data storage, where the information needs to remain unchanged, such as in archiving important documents or creating music albums. On the other hand, CD-RWs provide the advantage of being reusable, allowing users to erase and rewrite data multiple times. This versatility makes them ideal for applications that require frequent updates or temporary storage, such as in educational settings or for software testing. The availability of these options has broadened the appeal of blank discs, catering to a wide range of consumer and business needs.

Product Type Analysis

The blank discs and labels market is segmented into CDs, DVDs, Blu-ray discs, printable labels, and adhesive labels. CDs and DVDs have been traditional staples in the market, used extensively for personal and professional data storage. Despite the proliferation of digital media, CDs and DVDs maintain their relevance due to their cost-effectiveness, durability, and ease of use. They are particularly favored in regions where digital alternatives might not be as accessible or affordable.

Blu-ray discs represent a more advanced segment, offering significantly higher storage capacity compared to CDs and DVDs. This makes them ideal for high-definition video recording, large-scale data archiving, and software distribution, especially in industries requiring robust storage solutions. The increasing production of high-definition content and the need for reliable storage options are driving the demand for Blu-ray discs, althou
Uplift Modeling , Marketing Campaign Data
kaggle.com
zip
Updated Nov 1, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Möbius (2020). Uplift Modeling , Marketing Campaign Data [Dataset]. https://www.kaggle.com/arashnic/uplift-modeling
Explore at:
zip(340156703 bytes)Available download formats
Dataset updated
Nov 1, 2020
Authors
Möbius
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

Uplift modeling is an important yet novel area of research in machine learning which aims to explain and to estimate the causal impact of a treatment at the individual level. In the digital advertising industry, the treatment is exposure to different ads and uplift modeling is used to direct marketing efforts towards users for whom it is the most efficient . The data is a collection collection of 13 million samples from a randomized control trial, scaling up previously available datasets by a healthy 590x factor.

###
###

Content

The dataset was created by The Criteo AI Lab .The dataset consists of 13M rows, each one representing a user with 12 features, a treatment indicator and 2 binary labels (visits and conversions). Positive labels mean the user visited/converted on the advertiser website during the test period (2 weeks). The global treatment ratio is 84.6%. It is usual that advertisers keep only a small control population as it costs them in potential revenue.

Following is a detailed description of the features:

f0, f1, f2, f3, f4, f5, f6, f7, f8, f9, f10, f11: feature values (dense, float)

treatment: treatment group (1 = treated, 0 = control)

conversion: whether a conversion occured for this user (binary, label)

visit: whether a visit occured for this user (binary, label)

exposure: treatment effect, whether the user has been effectively exposed (binary)

###

Context

Uplift modeling is an important yet novel area of research in machine learning which aims to explain and to estimate the causal impact of a treatment at the individual level. In the digital advertising industry, the treatment is exposure to different ads and uplift modeling is used to direct marketing efforts towards users for whom it is the most efficient . The data is a collection collection of 13 million samples from a randomized control trial, scaling up previously available datasets by a healthy 590x factor.

###
###

Content

The dataset was created by The Criteo AI Lab .The dataset consists of 13M rows, each one representing a user with 12 features, a treatment indicator and 2 binary labels (visits and conversions). Positive labels mean the user visited/converted on the advertiser website during the test period (2 weeks). The global treatment ratio is 84.6%. It is usual that advertisers keep only a small control population as it costs them in potential revenue.

Following is a detailed description of the features:

f0, f1, f2, f3, f4, f5, f6, f7, f8, f9, f10, f11: feature values (dense, float)

treatment: treatment group (1 = treated, 0 = control)

conversion: whether a conversion occured for this user (binary, label)

visit: whether a visit occured for this user (binary, label)

exposure: treatment effect, whether the user has been effectively exposed (binary)

###

Starter Kernels

HistGradientBoostingClassifier Base Model

Acknowledgement

The data provided for paper: "A Large Scale Benchmark for Uplift Modeling"

https://s3.us-east-2.amazonaws.com/criteo-uplift-dataset/large-scale-benchmark.pdf

Eustache Diemert CAIL e.diemert@criteo.com

Artem Betlei CAIL & Université Grenoble Alpes a.betlei@criteo.com

Christophe Renaudin CAIL c.renaudin@criteo.com

Massih-Reza Amini Université Grenoble Alpes massih-reza.amini@imag.fr

For privacy reasons the data has been sub-sampled non-uniformly so that the original incrementality level cannot be deduced from the dataset while preserving a realistic, challenging benchmark. Feature names have been anonymized and their values randomly projected so as to keep predictive power while making it practically impossible to recover the original features or user context.

Inspiration

We can foresee related usages such as but not limited to:

Uplift modeling

Interactions between features and treatment

Heterogeneity of treatment

More Readings

Supercharging customer touchpoints with uplift modeling

CasualML

PyLift

MORE DATASETs ...
R
Invoice Management Dataset
universe.roboflow.com
zip
Updated Dec 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CVIP Workspace (2024). Invoice Management Dataset [Dataset]. https://universe.roboflow.com/cvip-workspace/invoice-management
Explore at:
zipAvailable download formats
Dataset updated
Dec 28, 2024
Dataset authored and provided by
CVIP Workspace
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Text Bounding Boxes
Description
Intelligent Invoice Management System

Project Description:
The Intelligent Invoice Management System is an advanced AI-powered platform designed to revolutionize traditional invoice processing. By automating the extraction, validation, and management of invoice data, this system addresses the inefficiencies, inaccuracies, and high costs associated with manual methods. It enables businesses to streamline operations, reduce human error, and expedite payment cycles.

Problem Statement:
Manual invoice processing involves labor-intensive tasks such as data entry, verification, and reconciliation. These processes are time-consuming, prone to errors, and can result in financial losses and delays. The diversity of invoice formats from various vendors adds complexity, making automation a critical need for efficiency and scalability.

Proposed Solution:
The Intelligent Invoice Management System automates the end-to-end process of invoice handling using AI and machine learning techniques. Core functionalities include:
1. Invoice Generation: Automatically generate PDF invoices in at least four formats, populated with synthetic data.
2. Data Development: Leverage a dataset containing fields such as receipt numbers, company details, sales tax information, and itemized tables to create realistic invoice samples.
3. AI-Powered Labeling: Use Tesseract OCR to extract labeled data from invoice images, and train YOLO for label recognition, ensuring precise identification of fields.
4. Database Integration: Store extracted information in a structured database for seamless retrieval and analysis.
5. Web-Based Information System: Provide a user-friendly platform to upload invoices and retrieve key metrics, such as:
- Total sales within a specified duration.
- Total sales tax paid during a given timeframe.
- Detailed invoice information in tabular form for specific date ranges.

Key Features and Deliverables:
1. Invoice Generation:
- Generate 20,000 invoices using an automated script.
- Include dummy logos, company details, and itemized tables for four items per invoice.

Label Definition and Format:

Define structured labels (TBLR, CLASS Name, Recognized Text).

Provide labels in both XML and JSON formats for seamless integration.

OCR and AI Training:

Automate labeling using Tesseract OCR for high-accuracy text recognition.

Train and test YOLO to detect and classify invoice fields (TBLR and CLASS).

Database Management:

Store OCR-extracted labels and field data in a database.

Enable efficient search and aggregation of invoice data.

Web-Based Interface:

Build a responsive system for users to upload invoices and retrieve data based on company name or NTN.

Display metrics and reports for total sales, tax paid, and invoice details over custom date ranges.

Expected Outcomes: - Reduction in manual effort and operational costs.
- Improved accuracy in invoice processing and financial reporting.
- Enhanced scalability and adaptability for diverse invoice formats.
- Faster turnaround time for invoice-related tasks.

By automating critical aspects of invoice management, this system delivers a robust and intelligent solution to meet the evolving needs of businesses.
EMG from Combination Gestures with Ground-truth Joystick Labels
zenodo.org
bin, zip
Updated Jan 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Niklas Smedemark-Margulies; Yunus Bicer; Elifnur Sunger; Stephanie Naufel; Tales Imbiriba; Eugene Tunik; Deniz Erdogmus; Mathew Yarossi; Niklas Smedemark-Margulies; Yunus Bicer; Elifnur Sunger; Stephanie Naufel; Tales Imbiriba; Eugene Tunik; Deniz Erdogmus; Mathew Yarossi (2024). EMG from Combination Gestures with Ground-truth Joystick Labels [Dataset]. http://doi.org/10.5281/zenodo.10393194
Explore at:
bin, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10393194
Dataset updated
Jan 4, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Niklas Smedemark-Margulies; Yunus Bicer; Elifnur Sunger; Stephanie Naufel; Tales Imbiriba; Eugene Tunik; Deniz Erdogmus; Mathew Yarossi; Niklas Smedemark-Margulies; Yunus Bicer; Elifnur Sunger; Stephanie Naufel; Tales Imbiriba; Eugene Tunik; Deniz Erdogmus; Mathew Yarossi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset of surface EMG recordings from 11 subjects performing single and combination gestures, from "**A Multi-label Classification Approach to Increase Expressivity of EMG-based Gesture Recognition**" by Niklas Smedemark-Margulies, Yunus Bicer, Elifnur Sunger, Stephanie Naufel, Tales Imbiriba, Eugene Tunik, Deniz Erdogmus, and Mathew Yarossi.

For more details and example usage, see the following:

Paper pdf - https://arxiv.org/pdf/2309.12217.pdf

Experiment code - https://github.com/neu-spiral/multi-label-emg

Contents

Dataset of single and combination gestures from 11 subjects.
Subjects participated in 13 experimental blocks.
During each block, they followed visual prompts to perform gestures while also manipulating a joystick.
Surface EMG was recorded from 8 electrodes on the forearm; labels were recorded according to the current visual prompt and the current state of the joystick.

Experiments included the following blocks:

1 Calibration block

6 Simultaneous-Pulse Combination blocks (3 without feedback, 3 with feedback)

6 Hold-Pulse Combination blocks (3 without feedback, 3 with feedback)

The contents of each block type were as follows:

In the Calibration block, subjects performed 8 repetitions of each of the 4 direction gestures, 2 modifier gestures, and a resting pose.
Each Calibration trial provided 160 overlapping examples, for a total of: 8 repetitions x 7 gestures x 160 examples = 8960 examples.

In Simultaneous-Pulse Combination blocks, subjects performed 8 trials of combination gestures, where both components were performed simultaneously.
Each Simultaneous-Pulse trial provided 240 overlapping examples, for a total of: 8 trials x 240 examples = 1920 examples.

In Hold-Pulse Combination blocks, subjects performed 28 trials of combination gestures, where 1 gesture component was held while the other was pulsed.
Each Hold-Pulse trial provided 240 overlapping examples, for a total of: 28 trials x 240 examples = 6720 examples.

A single data example (from any block) corresponds a window 250ms of EMG recorded at 1926Hz (built-in 20–450 Hz bandpass filtering applied).
A 50ms step size was used between each window; note that neighboring data examples are therefore overlapping.

Feedback was provided as follows:

In blocks with feedback, a model pre-trained on the Calibration data was used to give realtime visual feedback during the trial.

In blocks without feedback, no model was used, and the visual prompt was the only source of information about the current gesture.

For more details, see the paper.

Labels

Two types of labels are provided:

joystick labels were recorded based on the position of the joystick, and are treated as ground-truth.

visual labels were also recorded based on what prompt was currently being shown to the subject.

For both joystick and visual labels, the following structure applies. Each gesture trial has a two-part label.

The first label component describes the direction gesture, and takes values in {0, 1, 2, 3, 4}, with the following meaning:

0 - "Up" (joystick pull)

1 - "Down" (joystick push)

2 - "Left" (joystick left)

3 - "Right" (joystick right)

4 - "NoDirection" (absence of a direction gesture; none of the above)

The second label component describes the modifier gesture, and takes values in {0, 1, 2}, with the following meaning:

0 - "Pinch" (joystick trigger button)

1 - "Thumb" (joystick thumb button)

2 - "NoModifier" (absence of a modifier gesture; none of the above)

Examples of Label Structure

Single gestures have labels like (0, 2) indicating ("Up", "NoModifier") or (4, 1) indicating ("NoDirection", "Thumb").

Combination gesture have labels like (0, 0) indicating ("Up", "Pinch") or (2, 1) indicating ("Left", "Thumb").

File layout

Data are provided in Numpy and MATLAB format. Descriptions below apply for both.

Each experimental block is provided in a separate folder.
Within one experimental block, the following files are provided:

`data.npy` - Raw EMG data, with shape (items, channels, timesteps).

`joystick_direction_labels.npy` - one-hot joystick direction labels, with shape (items, 5).

`joystick_modifier_labels.npy` - one-hot joystick modifier labels, with shape (items, 3).

`visual_direction_labels.npy` - one-hot visual direction labels, with shape (items, 5).

`visual_modifier_labels.npy` - one-hot visual modifier labels, with shape (items, 3).

Loading data

For example code snippets for loading data, see the associated code repository.
Urban Sound & Sight (Urbansas) - Labeled set
zenodo.org
explore.openaire.eu
txt, zip
Updated Jun 20, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Magdalena Fuentes; Bea Steers; Pablo Zinemanas; Martín Rocamora; Luca Bondi; Julia Wilkins; Qianyi Shi; Yao Hou; Samarjit Das; Xavier Serra; Juan Pablo Bello; Magdalena Fuentes; Bea Steers; Pablo Zinemanas; Martín Rocamora; Luca Bondi; Julia Wilkins; Qianyi Shi; Yao Hou; Samarjit Das; Xavier Serra; Juan Pablo Bello (2022). Urban Sound & Sight (Urbansas) - Labeled set [Dataset]. http://doi.org/10.5281/zenodo.6658386
Explore at:
txt, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6658386
Dataset updated
Jun 20, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Magdalena Fuentes; Bea Steers; Pablo Zinemanas; Martín Rocamora; Luca Bondi; Julia Wilkins; Qianyi Shi; Yao Hou; Samarjit Das; Xavier Serra; Juan Pablo Bello; Magdalena Fuentes; Bea Steers; Pablo Zinemanas; Martín Rocamora; Luca Bondi; Julia Wilkins; Qianyi Shi; Yao Hou; Samarjit Das; Xavier Serra; Juan Pablo Bello
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Urban Sound & Sight (Urbansas):

Version 1.0, May 2022

Created by
Magdalena Fuentes (1, 2), Bea Steers (1, 2), Pablo Zinemanas (3), Martín Rocamora (4), Luca Bondi (5), Julia Wilkins (1, 2), Qianyi Shi (2), Yao Hou (2), Samarjit Das (5), Xavier Serra (3), Juan Pablo Bello (1, 2)
1. Music and Audio Research Lab, New York University
2. Center for Urban Science and Progress, New York University
3. Universitat Pompeu Fabra, Barcelona, Spain
4. Universidad de la República, Montevideo, Uruguay
5. Bosch Research, Pittsburgh, PA, USA

Publication

If using this data in academic work, please cite the following paper, which presented this dataset:
M. Fuentes, B. Steers, P. Zinemanas, M. Rocamora, L. Bondi, J. Wilkins, Q. Shi, Y. Hou, S. Das, X. Serra, J. Bello. “Urban Sound & Sight: Dataset and Benchmark for Audio-Visual Urban Scene Understanding”. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022.

Description

Urbansas is a dataset for the development and evaluation of machine listening systems for audiovisual spatial urban understanding. One of the main challenges to this field of study is a lack of realistic, labeled data to train and evaluate models on their ability to localize using a combination of audio and video.
We set four main goals for creating this dataset:
1. To compile a set of real-field audio-visual recordings;
2. The recordings should be stereo to allow exploring sound localization in the wild;
3. The compilation should be varied in terms of scenes and recording conditions to be meaningful for training and evaluation of machine learning models;
4. The labeled collection should be accompanied by a bigger unlabeled collection with similar characteristics to allow exploring self-supervised learning in urban contexts.
Audiovisual data
We have compiled and manually annotated Urbansas from two publicly available datasets, plus the addition of unreleased material. The public datasets are the TAU Urban Audio-Visual Scenes 2021 Development dataset (street-traffic subset) and the Montevideo Audio-Visual Dataset (MAVD):

Wang, Shanshan, et al. "A curated dataset of urban scenes for audio-visual scene analysis." ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021.

Zinemanas, Pablo, Pablo Cancela, and Martín Rocamora. "MAVD: A dataset for sound event detection in urban environments." Detection and Classification of Acoustic Scenes and Events, DCASE 2019, New York, NY, USA, 25–26 oct, page 263--267 (2019).

The TAU dataset consists of 10-second segments of audio and video from different scenes across European cities, traffic being one of the scenes. Only the scenes labeled as traffic were included in Urbansas. MAVD is an audio-visual traffic dataset curated in different locations of Montevideo, Uruguay, with annotations of vehicles and vehicle components sounds (e.g. engine, brakes) for sound event detection. Besides the published datasets, we include a total of 9.5 hours of unpublished material recorded in Montevideo, with the same recording devices of MAVD but including new locations and scenes.

Recordings for TAU were acquired using a GoPro Hero 5 (30fps, 1280x720) and a Soundman OKM II Klassik/studio A3 electret binaural in-ear microphone with a Zoom F8 audio recorder (48kHz, 24 bits, stereo). Recordings for MAVD were collected using a GoPro Hero 3 (24fps, 1920x1080) and a SONY PCM-D50 recorder (48kHz, 24 bits, stereo).

When compiled in Urbansas, it includes 15 hours of stereo audio and video, stored in separate 10 second MPEG4 (1280x720, 24fps) and WAV (48kHz, 24 bit, 2 channel) files. Both released video datasets are already anonymized to obscure people and license plates, the unpublished MAVD data was anonymized similarly using this anonymizer. We also distribute the 2fps video used for producing the annotations.

The audio and video files both share the same filename stem, meaning that they can be associated after removing the parent directory and extension.

MAVD:
video/

TAU:
video/

where location_id in both cases includes the city and an ID number.

city & places & clips & mins & frames & labeled mins \\
Montevideo & 8 & 4085 & 681 & 980400 & 92 \\
Stockholm & 3 & 91 & 15 & 21840 & 2 \\
Barcelona & 4 & 144 & 24 & 34560 & 24 \\
Helsinki & 4 & 144 & 24 & 34560 & 16 \\
Lisbon & 4 & 144 & 24 & 34560 & 19 \\
Lyon & 4 & 144 & 24 & 34560 & 6 \\
Paris & 4 & 144 & 24 & 34560 & 2 \\
Prague & 4 & 144 & 24 & 34560 & 2 \\
Vienna & 4 & 144 & 24 & 34560 & 6 \\
London & 5 & 144 & 24 & 34560 & 4 \\
Milan & 6 & 144 & 24 & 34560 & 6 \\
\midrule
Total & 50 & 5472 & 912 & 1.3M & 180 \\

Annotations

Of the 15 hours of audio and video, 3 hours of data (1.5 hours TAU, 1.5 hours MAVD) are manually annotated by our team both in audio and image, along with 12 hours of unlabeled data (2.5 hours TAU, 9.5 hours of unpublished material) for the benefit of unsupervised models. The distribution of clips across locations was selected to maximize variance across different scenes. The annotations were collected at 2 frames per second (FPS) as it provided a balance between temporal granularity and clip coverage.

The annotation data is contained in video_annotations.csv and audio_annotations.csv.

Video Annotations

Each row in the video annotations represents a single object in a single frame of the video. The annotation schema is as follows:

frame_id: The index of the frame within the clip the annotation is associated with. This index is 0-based and goes up to 19 (assuming 10-second clips with annotations at 2 FPS)

track_id: The ID of the detected instance that identifies the same object across different frames. These IDs are guaranteed to be unique within a clip.

x, y, w, h: The top-left corner and width and height of the object’s bounding box in the video. The values are given in absolute coordinates with respect to the image size (1280x720).

class_id: The index of the class corresponding to: [0, 1, 2, 3, -1] — see label for the index mapping. The -1 value corresponds to the case where there are no events, but still clip-level annotations, like night and city. When operating on bounding boxes, class_id of -1 should be filtered.

label: The label text. This is equivalent to LABELS[class_id], where LABELS=[car, bus, motorbike, truck, -1]. The label -1 has the same role as above.

visibility: The visibility of the object. This is 1 unless the object becomes obstructed, where it changes to 0.

filename: The file ID of the associated file. This is the file’s path minus the parent directory and extension.

city: The city where the clip was collected in.

location_id: The specific name of the location. This may include an integer ID following the city name for cases where there are multiple collection points.

time: The time (in seconds) of the annotation, relative to the start of the file. Equivalent to frame_id / fps .

night: Whether the clip takes place during the day or at night. This value is singular per clip.

subset: Which data source the data originally belongs to (TAU or MAVD).

Audio Annotations

Each row represents a single object instance, along with the time range that it exists within the clip. The annotation schema is as follows:

filename: The file ID odd the associated audio file. See filename above.

class_id, label: See above. Audio has an additional class_id of 4 (label=offscreen) which indicates an off-screen vehicle - meaning a vehicle that is heard but not seen. A class_id of -1 indicates a clip-level annotation for a clip that has no object annotations (an empty scene).

non_identifiable_vehicle_sound: True if the region contains the sound of vehicles where individual instances cannot be uniquely identified.

start, end: The start and end times (in seconds) of the annotation relative to the file.

Conditions of use

Dataset created by Magdalena Fuentes, Bea Steers, Pablo Zinemanas, Martín Rocamora, Luca Bondi, Julia Wilkins, Qianyi Shi, Yao Hou, Samarjit Das, Xavier Serra, and Juan Pablo Bello.

The Urbansas dataset is offered free of charge under the following terms:

Urbansas annotations are release under the CC BY 4.0 license

Urbansas video and audio replicates the original sources licenses:

MAVD subset is released under CC BY 4.0

TAU subset is released under a Non-Commercial license

Feedback

Please help us improve Urbansas by sending your feedback to:

Magdalena Fuentes: mfuentes@nyu.edu

Bea Steers: bsteers@nyu.edu

In case of a problem, please include as many details as possible.

Acknowledgments

This work was partially supported by the National Science
M
Fish Detection AI, Optic and Sonar-trained Object Detection Models
mhkdr.openei.org
data.openei.org
+1more
archive +2
Updated Jun 25, 2014
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Katherine Slater; Delano Yoder; Carlos Noyes; Brett Scott; Katherine Slater; Delano Yoder; Carlos Noyes; Brett Scott (2014). Fish Detection AI, Optic and Sonar-trained Object Detection Models [Dataset]. https://mhkdr.openei.org/submissions/600
Explore at:
website, archive, text_documentAvailable download formats
Dataset updated
Jun 25, 2014
Dataset provided by
USDOE Office of Energy Efficiency and Renewable Energy (EERE), Renewable Power Office. Water Power Technologies Office (EE-4WP)
Marine and Hydrokinetic Data Repository
Water Power Technology Office
Authors
Katherine Slater; Delano Yoder; Carlos Noyes; Brett Scott; Katherine Slater; Delano Yoder; Carlos Noyes; Brett Scott
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Fish Detection AI project aims to improve the efficiency of fish monitoring around marine energy facilities to comply with regulatory requirements. Despite advancements in computer vision, there is limited focus on sonar images, identifying small fish with unlabeled data, and methods for underwater fish monitoring for marine energy.

A YOLO (You Only Look Once) computer vision model was developed using the Eyesea dataset (optical) and sonar images from Alaska Fish and Games to identify fish in underwater environments. Supervised methods were used within YOLO to detect fish based on training using labeled data of fish. These trained models were then applied to different unseen datasets, aiming to reduce the need for labeling datasets and training new models for various locations. Additionally, hyper-image analysis and various image preprocessing methods were explored to enhance fish detection.

In this research we achieved: 1. Enhanced YOLO Performance, as compared to a published article (Xu, Matzner 2018) using earlier yolo versions for fish object identification. Specifically, we achieved a best mean Average Precision (mAP) of 0.68 on the Eyesea optical dataset using YOLO v8 (medium-sized model), surpassing previous YOLO v3 benchmarks from that previous article publication. We further demonstrated up to 0.65 mAP on unseen sonar domains by leveraging a hyper-image approach (stacking consecutive frames), showing promising cross-domain adaptability.

This submission of data includes: - The actual best-performing trained YOLO model neural network weights, which can be applied to do object detection (PyTorch files, .pt). These are found in the Yolo_models_downloaded zip file - Documentation file to explain the upload and the goals of each of the experiments 1-5, as detailed in the word document (named "Yolo_Object_Detection_How_To_Document.docx") - Coding files, namely 5 sub-folders of python, shell, and yaml files that were used to run the experiments 1-5, as well as a separate folder for yolo models. Each of these is found in their own zip file, named after each experiment - Sample data structures (sample1 and sample2, each with their own zip file) to show how the raw data should be structured after running our provided code on the raw downloaded data - link to the article that we were replicating (Xu, Matzner 2018) - link to the Yolo documentation site from the original creators of that model (ultralytics) - link to the downloadable EyeSea data set from PNNL (instructions on how to download and format the data in the right way to be able to replicate these experiments is found in the How To word document)

Facebook

Twitter

Click to copy link

Link copied

Cite

Data Insights Market (2025). Label Classifier Report [Dataset]. https://www.datainsightsmarket.com/reports/label-classifier-504593

Label Classifier Report

Explore at:

doc, ppt, pdfAvailable download formats

Dataset updated

May 31, 2025

Dataset authored and provided by

Data Insights Market

License

https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

Time period covered

2025 - 2033

Area covered

Global

Variables measured

Market Size

Description

The Label Classifier market is experiencing robust growth, driven by the increasing adoption of machine learning and artificial intelligence across diverse sectors. The market's expansion is fueled by the need for efficient and accurate data annotation and classification in applications ranging from image recognition and natural language processing to medical diagnosis and fraud detection. The rising volume of unstructured data and the need for automated data analysis are key catalysts for this growth. While precise market sizing data wasn't provided, considering the involvement of major tech players like Google, Microsoft, and Amazon, along with specialized AI companies, a reasonable estimate for the 2025 market size could be in the range of $500 million to $1 billion, depending on the specific definition of "Label Classifier" and the inclusion of related technologies. A Compound Annual Growth Rate (CAGR) of 25-30% over the forecast period (2025-2033) seems realistic given the current technological advancements and market demand. This growth is anticipated to continue, fueled by several factors. Advancements in deep learning algorithms, improved computational power, and the availability of larger datasets are enhancing the accuracy and efficiency of label classifiers. Furthermore, the increasing demand for automation in various industries, coupled with the growing need for real-time insights from data, will propel the market forward. However, challenges such as data security concerns, the need for skilled professionals to develop and maintain these systems, and the high computational costs associated with complex label classifiers could potentially act as restraints. The market is segmented based on deployment (cloud, on-premise), application (image recognition, text analysis, etc.), and industry (healthcare, finance, etc.). Key players are actively investing in research and development, expanding their product portfolios, and forging strategic partnerships to maintain a competitive edge in this rapidly evolving market. The competitive landscape is dynamic, with both established tech giants and specialized AI startups vying for market share.

Clear search

Close search

Google apps

Main menu

Label Classifier Report

FDA Online Label Repository

FSDnoisy18k

In Mold Labelling Market Analysis, Size, and Forecast 2025-2029: Europe...

Snapshot img

Data from: ImageNet Dataset

Data from: Processed Lab Data for Neural Network-Based Shear Stress Level...

Replication Data for: Detecting Voter Understanding of Ideological Labels...

Labeled Temporal Brain Networks

Lig-PCDB: Labeled Databases of X-ray Ligands Images in 3D Point Clouds and...

ramp Building Footprint Dataset - N'Djamena, Chad

Antisemitism on Twitter: A Dataset for Machine Learning and Text Analytics

Data from: Comment on the Definition and Labeling of pK50

ramp Building Footprint Dataset - Manjama, Sierra Leone

Bangla Multilabel Cyberbully, Sexual Harrasment, Threat and Spam Detection...

Blank Discs and Labels Market Report | Global Forecast From 2025 To 2033

Blank Discs and Labels Market Outlook

Product Type Analysis

Uplift Modeling , Marketing Campaign Data

Context

Content

Context

Content

Starter Kernels

Acknowledgement

Inspiration

More Readings

MORE DATASETs ...

Invoice Management Dataset

EMG from Combination Gestures with Ground-truth Joystick Labels

Contents

Labels

Examples of Label Structure

File layout

Loading data

Urban Sound & Sight (Urbansas) - Labeled set

Fish Detection AI, Optic and Sonar-trained Object Detection Models

Label Classifier Report