9 datasets found
  1. Data Collection Labeling Market Demand, Size and Competitive Analysis |...

    • techsciresearch.com
    Updated Jan 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TechSci Research (2025). Data Collection Labeling Market Demand, Size and Competitive Analysis | TechSci Research [Dataset]. https://www.techsciresearch.com/report/data-collection-labeling-market/19345.html
    Explore at:
    Dataset updated
    Jan 15, 2025
    Dataset authored and provided by
    TechSci Research
    License

    https://www.techsciresearch.com/privacy-policy.aspxhttps://www.techsciresearch.com/privacy-policy.aspx

    Description

    Global Data Collection Labeling market was valued at USD 2.23 Billion in 2024 and is expected to reach USD 8.23 Billion by 2030 with a CAGR of 24.12% during the forecast period.

    Pages180
    Market Size2024: USD 2.23 billion
    Forecast Market Size2030: USD 8.23 billion
    CAGR2025-2030: 24.12%
    Fastest Growing SegmentBFSI
    Largest MarketNorth America
    Key Players1. Appen Limited 2. Cogito Tech 3. Deep Systems, LLC 4. CloudFactory Limited 5. Anthropic, PBC 6. Alegion AI, Inc 7. Hive Technology, Inc 8. Toloka AI BV 9. Labelbox, Inc. 10. Summa Linguae Technologies

  2. Labels Import Data of Entertainment Merchandise Nu Inc Importer in USA

    • seair.co.in
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Seair Exim, Labels Import Data of Entertainment Merchandise Nu Inc Importer in USA [Dataset]. https://www.seair.co.in
    Explore at:
    .bin, .xml, .csv, .xlsAvailable download formats
    Dataset provided by
    Seair Exim Solutions
    Authors
    Seair Exim
    Area covered
    United States
    Description

    Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.

  3. w

    Global Text Data Labeling Market Research Report: By Labeling Methodology...

    • wiseguyreports.com
    Updated Aug 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    wWiseguy Research Consultants Pvt Ltd (2024). Global Text Data Labeling Market Research Report: By Labeling Methodology (Manual Labeling, Semi-Automated Labeling, Automated Labeling, Crowd-Sourced Labeling), By Labeling Type (Named Entity Recognition, Part-of-Speech Tagging, Sentiment Analysis, Intent Detection, Machine Translation), By Industry Vertical (Automotive, Healthcare, Financial Services, Retail, Media and Entertainment), By Data Type (Textual Data, Image Data, Audio Data, Video Data), By Deployment Model (On-Premise, Cloud-Based, Hybrid) and By Regional (North America, Europe, South America, Asia Pacific, Middle East and Africa) - Forecast to 2032. [Dataset]. https://www.wiseguyreports.com/cn/reports/text-data-labeling-market
    Explore at:
    Dataset updated
    Aug 6, 2024
    Dataset authored and provided by
    wWiseguy Research Consultants Pvt Ltd
    License

    https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy

    Time period covered
    Jan 8, 2024
    Area covered
    Global
    Description
    BASE YEAR2024
    HISTORICAL DATA2019 - 2024
    REPORT COVERAGERevenue Forecast, Competitive Landscape, Growth Factors, and Trends
    MARKET SIZE 20230.43(USD Billion)
    MARKET SIZE 20240.5(USD Billion)
    MARKET SIZE 20321.623(USD Billion)
    SEGMENTS COVEREDLabeling Methodology ,Labeling Type ,Industry Vertical ,Data Type ,Deployment Model ,Regional
    COUNTRIES COVEREDNorth America, Europe, APAC, South America, MEA
    KEY MARKET DYNAMICSIncreasing adoption of natural language processing NLP and machine learning ML Rising demand for data labeling for training AI and ML models Growing need for accurate and consistent data labeling Stringent data privacy regulations Technological advancements in labeling tools and techniques
    MARKET FORECAST UNITSUSD Billion
    KEY COMPANIES PROFILEDPlayment ,Cogito ,Telus International ,Appen ,Lionbridge AI ,iMerit Technology Services ,Qualtrics ,Hive ,SuperAnnotate ,Premise Data ,Prolific ,Cloud Factory ,Scale AI ,Sama
    MARKET FORECAST PERIOD2025 - 2032
    KEY MARKET OPPORTUNITIES1 Growing Demand for NLP and AI 2 Increased Focus on Data Quality 3 Expansion into New Industries 4 Advancements in ML Algorithms 5 Surge in IoT and Connected Devices
    COMPOUND ANNUAL GROWTH RATE (CAGR) 15.88% (2025 - 2032)
  4. English Headlines Dataset

    • kaggle.com
    Updated Apr 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anil Guven (2021). English Headlines Dataset [Dataset]. https://www.kaggle.com/datasets/anil1055/english-headlines-dataset/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 9, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Anil Guven
    Description

    Dataset consists of 5 news type labels. These labels are business, entertainment, medicine, technology and sport. This dataset was created by using uci-news-aggregator and sport sites(Bbc, Espn, etc). There are 1000 headlines for each label in the dataset . Hence, total headlines count is 5000 for dataset.

    Dataset has used many studies and researches. These researches are followed as: (please citation this paper) Güven, Z. A., Diri, B., & Çakaloǧlu, T. (2018). Classification of New Titles by Two Stage Latent Dirichlet Allocation. Proceedings - 2018 Innovations in Intelligent Systems and Applications Conference, ASYU 2018. https://doi.org/10.1109/ASYU.2018.8554027

  5. ⚛️ BreakingBad Script

    • kaggle.com
    Updated Mar 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    mexwell (2024). ⚛️ BreakingBad Script [Dataset]. https://www.kaggle.com/datasets/mexwell/breakingbad-script
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 27, 2024
    Dataset provided by
    Kaggle
    Authors
    mexwell
    Description

    Breaking Bad Script directly scrapped from Forever Dreaming. The data has about 5596 dialogs (observations) in total with 5 variables which are: - actor - text (which is the dialog itself) - season - episode - title of the episode.

    Important Information

    If you are a fan then you would know that the series has a total of 5 seasons. Unfortunately, the transcripts data available online has labels attached to each dialog until episode 6 of season 3. I exhausted all other resources to try and get the transcript with labels for the remaining episodes, but was unable to find any resource apart from the few original text pdf of the screenplay. After reviewing that document before converting into text, I realsied that there where barely any dialogs and those PDFs were mainly focused on setting the scene for each act, which is not what I was looking for. Therefore, I made a conscious decision to work with the data I have.

    Original data can be found here

  6. m

    KurdiSent: A Corpus For Kurdish Sentiment Analysis

    • data.mendeley.com
    Updated Feb 3, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Soran Badawi (2023). KurdiSent: A Corpus For Kurdish Sentiment Analysis [Dataset]. http://doi.org/10.17632/3yrkswy6ph.1
    Explore at:
    Dataset updated
    Feb 3, 2023
    Authors
    Soran Badawi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Kurdish language is regarded as one of the less-resourced languages. The language is globally practised by 30-40 people. The language has 33 letters that are largely similar to the Arabic language. The Kurdish language has two major dialects Sorani and Badini. The dataset includes a collection of texts written in the Sorani dialect. It contains both tweets and comments from giant social media platforms such as Twitter, Facebook and YouTube. Due to security reasons and following the policies of both Twitter, Facebook and YouTube, we removed the user's identity. We collected the tweets and comments which was published during the time of the Corona Virus pandemic. The tweets and comments are raw texts, and the content covers a varied range of topics, starting from politics, sports, entertainment, social life, etc. Data Labeling The Facepager was employed to crawl the comments from both Facebook and YouTube. Moreover, we used the Twitter developer to mine the tweets. The dataset was annotated manually by three Kurdish native speakers. The annotators were required to identify the classes and categories of each text. The classes included positive, negative and neutral and the categories consisted of news, technology, art, social and health. The texts which were agreed upon by at least two annotators to possess a specific label and category were regarded as conflict-free and accepted for further processing. Other texts that caused conflict among all three raters were ignored and have been removed from the dataset. The doccano program was used to help the annotators label each text one by one.

  7. Global Holographic Scratch-off Labels Market Competitive Landscape 2025-2032...

    • statsndata.org
    excel, pdf
    Updated May 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stats N Data (2025). Global Holographic Scratch-off Labels Market Competitive Landscape 2025-2032 [Dataset]. https://www.statsndata.org/report/global-252569
    Explore at:
    excel, pdfAvailable download formats
    Dataset updated
    May 2025
    Dataset authored and provided by
    Stats N Data
    License

    https://www.statsndata.org/how-to-orderhttps://www.statsndata.org/how-to-order

    Area covered
    Global
    Description

    The Holographic Scratch-off Labels market has witnessed significant growth in recent years, driven by the increasing demand for innovative packaging solutions across various industries, including retail, entertainment, and promotional sectors. These labels, characterized by their vibrant holographic designs and inte

  8. East African News Classification

    • kaggle.com
    Updated Jan 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). East African News Classification [Dataset]. https://www.kaggle.com/datasets/thedevastator/east-african-news-classification
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 29, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    Africa, East Africa
    Description

    East African News Classification

    Classifying Text Content Across East Africa

    By [source]

    About this dataset

    This Swahili News Classification Dataset offers critical insights into media streams across East Africa, allowing for tailored insights related to racial tensions and social shifts. By utilizing the columns of text, label and content, this dataset allows researchers and data scientists to track classified news content from different countries in the region. From political unrest to gender-based violence, this dataset offers a comprehensive portrait of the various news stories from East African nations with practical applications for understanding how culture shapes press reporting and how media outlets portray world events. Alongside direct text information about individual stories, it is important that we study classifications like category and label in order to draw important conclusions about our society; by addressing these research questions with precise categorizations at hand we can ensure alignment between collected data points while also recognizing the unique nuances that characterize each country's media stream. This comprehensive dataset is essential for any project related to understanding communication processes between societies or tracking information flows within an interconnected global system

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset is perfect for anyone looking to build a machine learning model to classify news content across East Africa. With this dataset, you can create a classifier that can automatically identify and categorize news stories into topics such as politics, economics, health, sports, environment and entertainment. This dataset contains labeled text data for training a model to learn how to classify the content of news articles written in Swahili.

    Step 1: Understand the Dataset

    The first step towards building your classifier is getting familiar with the dataset provided. The list below outlines each column in the dataset:

    • text: The text of the news article
    • label: The category or topic assigned to the article
    • content: The text content of the news article
    • category: The category or topic assigned to the article

      This dataset contains all you need for creating your classification model— pre-labeled articles with topics assigned by human annotators. Additionally, there are no date values associated with any of these columns listed. All articles have been labeled already so we won’t need those when creating our classifier!

      We also need information about what languages are used in this context– good thing we’re working on classifying Swahili texts! After understanding more about which language these texts use we can move on towards selecting an appropriate algorithm for our task at hand – i.e., applying supervised machine learning algorithms that leverage both labeled and unlabeled data sets within this circumstances such as Language Modeling and Text Classification models like Naive Bayes Classifiers (NBCs), Maximum Entropy (MaxEnt) models among other traditional ML Models too but they most probably won’t be up enough robustness & accuracy merely when predicting unseen texts correctly; deep learning techniques often known as multi-layer perceptron (MLPs) may boost out best reporting performance results as desired from expected predictions from our trained/tested set yet since it sounds kinda costly computation complexity wise regarding its many layers involved nature than just classic linear sequence network ones — something could easily cover most cases am sure– however this tutorial does not focus precisely upon such topics since its part will take us way beyond current bounds so just keep moving along! ^^

      Step 2 Preprocess Text Data

      Once you understand what each column represents we can start preparing our data by preprocessing it so that it is ready to be used by any algorithm chosen

    Research Ideas

    • Predicting trend topics of news coverage across East Africa by identifying news categories with the highest frequency of occurrences over given time periods.
    • Identifying and flagging potential bias in news coverage across East Africa by analyzing the prevalence of certain labels or topics to discover potential trends in reporting style.
    • Developing a predictive model to determine which topic or category will have higher visibility based on the amount of related content that is published in each region around East Africa

    Acknowledgements

    &...

  9. Cat Images Classification Dataset

    • kaggle.com
    Updated Oct 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Satwik Tanwar (2023). Cat Images Classification Dataset [Dataset]. https://www.kaggle.com/datasets/satwiktanwar/cat-images-classification-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 19, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Satwik Tanwar
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    The dataset contains two files in h5 format: 1. test_catvnoncat.h5: It contains 50 test examples of cat and non-cat images 2. train_catvnoncat.h5: It contains 209 train examples of cat and non-cat images

    The dataset contains images of size 64x64. The task is to classify an image as a cat (1) or not a cat (0). I am going to publish a series of notebooks for this dataset that would demonstrate neural networks from very basic level. Each notebook will build upon the previous one. Stay tuned to learn neural networks with the help of those notebooks!

    You can use the below code snippet to load and visualize the dataset. ```python import numpy as np import matplotlib.pyplot as plt import h5py import os

    for dirname, _, filenames in os.walk('/kaggle/input'): for filename in filenames: print(os.path.join(dirname, filename))

    def load_data(): train_dataset = h5py.File('/kaggle/input/cat-images-classification-dataset/train_catvnoncat.h5', "r") train_set_x_orig = np.array(train_dataset["train_set_x"][:]) # your train set features train_set_y_orig = np.array(train_dataset["train_set_y"][:]) # your train set labels

    test_dataset = h5py.File('/kaggle/input/cat-images-classification-dataset/test_catvnoncat.h5', "r")
    test_set_x_orig = np.array(test_dataset["test_set_x"][:]) # your test set features
    test_set_y_orig = np.array(test_dataset["test_set_y"][:]) # your test set labels
    
    classes = np.array(test_dataset["list_classes"][:]) # the list of classes
    
    train_set_y_orig = train_set_y_orig.reshape((1, train_set_y_orig.shape[0]))
    test_set_y_orig = test_set_y_orig.reshape((1, test_set_y_orig.shape[0]))
    
    return train_set_x_orig, train_set_y_orig, test_set_x_orig, test_set_y_orig, classes
    

    load data

    train_x_orig, train_y, test_x_orig, test_y, classes = load_data()

    visualize an example image

    index = 10 plt.imshow(train_x_orig[index]) print ("y = " + str(train_y[0,index]) + ". It's a " + classes[train_y[0,index]].decode("utf-8") + " picture.") ```

  10. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
TechSci Research (2025). Data Collection Labeling Market Demand, Size and Competitive Analysis | TechSci Research [Dataset]. https://www.techsciresearch.com/report/data-collection-labeling-market/19345.html
Organization logo

Data Collection Labeling Market Demand, Size and Competitive Analysis | TechSci Research

Explore at:
Dataset updated
Jan 15, 2025
Dataset authored and provided by
TechSci Research
License

https://www.techsciresearch.com/privacy-policy.aspxhttps://www.techsciresearch.com/privacy-policy.aspx

Description

Global Data Collection Labeling market was valued at USD 2.23 Billion in 2024 and is expected to reach USD 8.23 Billion by 2030 with a CAGR of 24.12% during the forecast period.

Pages180
Market Size2024: USD 2.23 billion
Forecast Market Size2030: USD 8.23 billion
CAGR2025-2030: 24.12%
Fastest Growing SegmentBFSI
Largest MarketNorth America
Key Players1. Appen Limited 2. Cogito Tech 3. Deep Systems, LLC 4. CloudFactory Limited 5. Anthropic, PBC 6. Alegion AI, Inc 7. Hive Technology, Inc 8. Toloka AI BV 9. Labelbox, Inc. 10. Summa Linguae Technologies

Search
Clear search
Close search
Google apps
Main menu