20 datasets found
  1. d

    Image and Video Description Data | 1 PB | Multimodal Data | GenAI | LLM Data...

    • datarade.ai
    Updated Jan 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nexdata (2025). Image and Video Description Data | 1 PB | Multimodal Data | GenAI | LLM Data | Large Language Model(LLM) Data| AI Datasets [Dataset]. https://datarade.ai/data-products/nexdata-image-and-video-description-data-1-pb-multimoda-nexdata
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset updated
    Jan 3, 2025
    Dataset authored and provided by
    Nexdata
    Area covered
    Belgium, Ecuador, Czech Republic, Canada, Netherlands, Finland, Malta, Mexico, United Arab Emirates, Israel
    Description
    1. Image Description Data Data Size: 500 million pairs Image Type: generic scene(portrait, landscapes, animals,etc), human action, picture book, magazine, PPT&chart, App screenshot, and etc. Resolution: 4K+ Description Language: English, Spanish, Portuguese, French, Korean, German, Chinese, Japanese Description Length: text length is no less than 250 words Format: the image format is .jpg, the annotation format is .json, and the description format is .txt

    2. Video Description Data Data Size: 10 million pairs Image Type: generic scene(portrait, landscapes, animals,etc), ads, TV sports, documentaries Resolution: 1080p+ Description Language: English, Spanish, Portuguese, French, Korean, German, Chinese, Japanese Description Length: text length is no less than 250 words Format: .mp4,.mov,.avi and other common formats;.xlsx (annotation file format)

    3. About Nexdata Nexdata owns off-the-shelf PB-level Large Language Model(LLM) Data, 1 million hours of Audio Data and 800TB of Annotated Imagery Data. These ready-to-go data supports instant delivery, quickly improve the accuracy of AI models. For more details, please visit us at https://www.nexdata.ai/datasets/llm?source=Datarade

  2. Image and Video Description Data | 1 PB | Multimodal Data | GenAI | LLM Data...

    • data.nexdata.ai
    Updated Feb 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nexdata (2025). Image and Video Description Data | 1 PB | Multimodal Data | GenAI | LLM Data | Large Language Model(LLM) Data| AI Datasets [Dataset]. https://data.nexdata.ai/products/nexdata-image-and-video-description-data-1-pb-multimoda-nexdata
    Explore at:
    Dataset updated
    Feb 4, 2025
    Dataset authored and provided by
    Nexdata
    Area covered
    Poland, New Zealand, South Africa, Dominican Republic, Slovakia, Philippines, Albania, Norway, Australia, Luxembourg
    Description

    Off-the-shelf 1PB image and video description data covers multiple scenes, languages, and domains.

  3. d

    16kHz Conversational Speech Data | 35,000 Hours | Large Language Model(LLM)...

    • datarade.ai
    Updated Dec 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nexdata (2023). 16kHz Conversational Speech Data | 35,000 Hours | Large Language Model(LLM) Data | Speech AI Datasets|Machine Learning (ML) Data [Dataset]. https://datarade.ai/data-products/nexdata-multilingual-conversational-speech-data-16khz-mob-nexdata
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset updated
    Dec 9, 2023
    Dataset authored and provided by
    Nexdata
    Area covered
    Germany, Saudi Arabia, Ecuador, Canada, Malaysia, Austria, Vietnam, Korea (Republic of), Indonesia, Turkey
    Description
    1. Specifications Format : 16kHz 16bit, uncompressed wav, mono channel;

    Environment : quiet indoor environment, without echo;

    Recording content : No preset linguistic data,dozens of topics are specified, and the speakers make dialogue under those topics while the recording is performed;

    Demographics : Speakers are evenly distributed across all age groups, covering children, teenagers, middle-aged, elderly, etc.

    Annotation : annotating for the transcription text, speaker identification, gender and noise symbols;

    Device : Android mobile phone, iPhone;

    Language : 100+ Languages;

    Application scenarios : speech recognition; voiceprint recognition;

    Accuracy rate : the word accuracy rate is not less than 98%

    1. About Nexdata Nexdata owns off-the-shelf PB-level Large Language Model(LLM) Data, 1 million hours of Audio Data and 800TB of Annotated Imagery Data. These ready-to-go Machine Learning (ML) Data support instant delivery, quickly improve the accuracy of AI models. For more details, please visit us at https://www.nexdata.ai/datasets/speechrecog?source=Datarade
  4. 16kHz Conversational Speech Data | 35,000 Hours | Large Language Model(LLM)...

    • data.nexdata.ai
    Updated Aug 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nexdata (2024). 16kHz Conversational Speech Data | 35,000 Hours | Large Language Model(LLM) Data | Speech AI Datasets|Machine Learning (ML) Data [Dataset]. https://data.nexdata.ai/products/nexdata-multilingual-conversational-speech-data-16khz-mob-nexdata
    Explore at:
    Dataset updated
    Aug 3, 2024
    Dataset authored and provided by
    Nexdata
    Area covered
    Malaysia, Italy, Bulgaria, Switzerland, Pakistan, Egypt, Hong Kong, Brazil, Ukraine, Syrian Arab Republic
    Description

    Nexdata has off-the-shelf 35,000 hours Machine Learning (ML) Data of 16kHz conversational speech, covering 100+ countries including English, German, French, Spanish, Italian, Portuguese, Korean, Japanese, Hindi, Russia and etc.

  5. In-Cabin Speech Data | 15,000 Hours | AI Training Data | Speech Recognition...

    • datarade.ai
    Updated Apr 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nexdata (2023). In-Cabin Speech Data | 15,000 Hours | AI Training Data | Speech Recognition Data | Audio Data |Natural Language Processing (NLP) Data [Dataset]. https://datarade.ai/data-products/nexdata-in-car-speech-data-15-000-hours-audio-ai-ml-t-nexdata
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset updated
    Apr 23, 2024
    Dataset authored and provided by
    Nexdata
    Area covered
    Switzerland, Netherlands, Germany, Poland, Egypt, Argentina, Turkey, Austria, Romania, Russian Federation
    Description
    1. Specifications Format : Audio format: 48kHz, 16bit, uncompressed wav, mono channel; Vedio format: MP4

    Recording Environment : In-car;1 quiet scene, 1 low noise scene, 3 medium noise scenes and 2 high noise scenes

    Recording Content : It covers 5 fields: navigation field, multimedia field, telephone field, car control field and question and answer field; 500 sentences per people

    Speaker : Speakers are evenly distributed across all age groups, covering children, teenagers, middle-aged, elderly, etc.

    Device : High fidelity microphone; Binocular camera

    Language : 20 languages

    Transcription content : text

    Accuracy rate : 98%

    Application scenarios : speech recognition, Human-computer interaction; Natural language processing and text analysis; Visual content understanding, etc.

    1. About Nexdata Nexdata owns off-the-shelf PB-level Large Language Model(LLM) Data, 1 million hours of Audio Data and 800TB of Annotated Imagery Data. These ready-to-go Natural Language Processing (NLP) Data support instant delivery, quickly improve the accuracy of AI models. For more details, please visit us at https://www.nexdata.ai/datasets/speechrecog?source=Datarade
  6. n

    Unsupervised Speech Data |1 Million Hours | Spontaneous Speech | LLM |...

    • data.nexdata.ai
    Updated Feb 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nexdata (2025). Unsupervised Speech Data |1 Million Hours | Spontaneous Speech | LLM | Pre-training |Large Language Model(LLM) Data [Dataset]. https://data.nexdata.ai/products/nexdata-multilingual-unsupervised-speech-data-1-million-ho-nexdata
    Explore at:
    Dataset updated
    Feb 13, 2025
    Dataset authored and provided by
    Nexdata
    Area covered
    France
    Description

    Off-the-shelf 1 million hours of Unsupervised speech dataset, covering 10+ languages(English, French, German, Japanese, Arabic, Mandarin and etc. , 100,000 hours each). The content covers dialogues or monologues in 28 common domains, such as daily vlogs, travel, podcast, technology, beauty, etc.

  7. Scripted Monologues Speech Data | 65,000 Hours | Generative AI Audio Data|...

    • datarade.ai
    Updated Dec 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nexdata (2023). Scripted Monologues Speech Data | 65,000 Hours | Generative AI Audio Data| Speech Recognition Data | Machine Learning (ML) Data [Dataset]. https://datarade.ai/data-products/nexdata-multilingual-read-speech-data-65-000-hours-aud-nexdata
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset updated
    Dec 11, 2023
    Dataset authored and provided by
    Nexdata
    Area covered
    Chile, Luxembourg, Pakistan, Taiwan, France, Italy, Japan, Poland, Puerto Rico, Uruguay
    Description
    1. Specifications Format : 16kHz, 16bit, uncompressed wav, mono channel

    Recording environment : quiet indoor environment, without echo

    Recording content (read speech) : economy, entertainment, news, oral language, numbers, letters

    Speaker : native speaker, gender balance

    Device : Android mobile phone, iPhone

    Language : 100+ languages

    Transcription content : text, time point of speech data, 5 noise symbols, 5 special identifiers

    Accuracy rate : 95% (the accuracy rate of noise symbols and other identifiers is not included)

    Application scenarios : speech recognition, voiceprint recognition

    1. About Nexdata Nexdata owns off-the-shelf PB-level Large Language Model(LLM) Data, 1 million hours of Audio Data and 800TB of Annotated Imagery Data. These ready-to-go Machine Learning (ML) Data support instant delivery, quickly improve the accuracy of AI models. For more details, please visit us at https://www.nexdata.ai/datasets/speechrecog?source=Datarade
  8. h

    SIFT-50M

    • huggingface.co
    Updated Jun 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amazon AGI (2025). SIFT-50M [Dataset]. https://huggingface.co/datasets/amazon-agi/SIFT-50M
    Explore at:
    Dataset updated
    Jun 1, 2025
    Dataset authored and provided by
    Amazon AGI
    License

    https://choosealicense.com/licenses/cdla-sharing-1.0/https://choosealicense.com/licenses/cdla-sharing-1.0/

    Description

    Dataset Card for SIFT-50M

    SIFT-50M (Speech Instruction Fine-Tuning) is a 50-million-example dataset designed for instruction fine-tuning and pre-training of speech-text large language models (LLMs). It is built from publicly available speech corpora containing a total of 14K hours of speech and leverages LLMs and off-the-shelf expert models. The dataset spans five languages, covering diverse aspects of speech understanding and controllable speech generation instructions. SIFT-50M… See the full description on the dataset page: https://huggingface.co/datasets/amazon-agi/SIFT-50M.

  9. d

    Speech Synthesis Data | 400 Hours | TTS Data | Audio Data | AI Training...

    • datarade.ai
    Updated Dec 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nexdata (2023). Speech Synthesis Data | 400 Hours | TTS Data | Audio Data | AI Training Data| AI Datasets [Dataset]. https://datarade.ai/data-products/nexdata-multilingual-speech-synthesis-data-400-hours-a-nexdata
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset updated
    Dec 10, 2023
    Dataset authored and provided by
    Nexdata
    Area covered
    Colombia, Malaysia, Belgium, Philippines, Canada, Austria, Finland, Singapore, Hong Kong, Sweden
    Description
    1. Specifications Format : 44.1 kHz/48 kHz, 16bit/24bit, uncompressed wav, mono channel.

    Recording environment : professional recording studio.

    Recording content : general narrative sentences, interrogative sentences, etc.

    Speaker : native speaker

    Annotation Feature : word transcription, part-of-speech, phoneme boundary, four-level accents, four-level prosodic boundary.

    Device : Microphone

    Language : American English, British English, Japanese, French, Dutch, Catonese, Canadian French,Australian English, Italian, New Zealand English, Spanish, Mexican Spanish

    Application scenarios : speech synthesis

    Accuracy rate: Word transcription: the sentences accuracy rate is not less than 99%. Part-of-speech annotation: the sentences accuracy rate is not less than 98%. Phoneme annotation: the sentences accuracy rate is not less than 98% (the error rate of voiced and swallowed phonemes is not included, because the labelling is more subjective). Accent annotation: the word accuracy rate is not less than 95%. Prosodic boundary annotation: the sentences accuracy rate is not less than 97% Phoneme boundary annotation: the phoneme accuracy rate is not less than 95% (the error range of boundary is within 5%)

    1. About Nexdata Nexdata owns off-the-shelf PB-level Large Language Model(LLM) Data, 1 million hours of Audio Data and 800TB of Annotated Imagery Data. These ready-to-go AI & ML Training Data support instant delivery, quickly improve the accuracy of AI models. For more details, please visit us at https://www.nexdata.ai/datasets/tts?source=Datarade
  10. h

    WebR-Pro-100k

    • huggingface.co
    Updated Apr 26, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yuxin Jiang (2025). WebR-Pro-100k [Dataset]. https://huggingface.co/datasets/YuxinJiang/WebR-Pro-100k
    Explore at:
    Dataset updated
    Apr 26, 2025
    Authors
    Yuxin Jiang
    Description

    Instruction-Tuning Data Synthesis from Scratch via Web Reconstruction (ACL 2025)

    arXiv link: https://arxiv.org/abs/2504.15573Github: https://github.com/YJiangcm/WebR Leveraging an off-the-shelf LLM, WebR transforms raw web documents into high-quality instruction-response pairs. It strategically assigns each document as either an instruction or a response to trigger the process of web reconstruction. We released our generated datasets on Huggingface:

    Dataset Generator Size… See the full description on the dataset page: https://huggingface.co/datasets/YuxinJiang/WebR-Pro-100k.

  11. 8kHz Conversational Speech Data | 15,000 Hours | Audio Data | Speech...

    • datarade.ai
    Updated Dec 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nexdata (2023). 8kHz Conversational Speech Data | 15,000 Hours | Audio Data | Speech Recognition Data| Machine Learning (ML) Data [Dataset]. https://datarade.ai/data-products/nexdata-multilingual-conversational-speech-data-8khz-tele-nexdata
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset updated
    Dec 10, 2023
    Dataset authored and provided by
    Nexdata
    Area covered
    Czech Republic, United Arab Emirates, Argentina, United States of America, Romania, Philippines, Poland, Vietnam, Singapore, Netherlands
    Description
    1. Specifications Format : 8kHz, 8bit, u-law/a-law pcm, mono channel;

    Environment : quiet indoor environment, without echo;

    Recording content : No preset linguistic data,dozens of topics are specified, and the speakers make dialogue under those topics while the recording is performed;

    Demographics : Speakers are evenly distributed across all age groups, covering children, teenagers, middle-aged, elderly, etc.

    Annotation : annotating for the transcription text, speaker identification, gender and noise symbols;

    Device : Telephony recording system;

    Language : 100+ Languages;

    Application scenarios : speech recognition; voiceprint recognition;

    Accuracy rate : the word accuracy rate is not less than 98%

    1. About Nexdata Nexdata owns off-the-shelf PB-level Large Language Model(LLM) Data, 1 million hours of Audio Data and 800TB of Annotated Imagery Data. These ready-to-go Machine Learning (ML) Data support instant delivery, quickly improve the accuracy of AI models. For more details, please visit us at https://www.nexdata.ai/datasets/speechrecog?source=Datarade
  12. d

    Mixed Speech Data |5,000 Hours |Code-switching|Audio Data| Speech...

    • datarade.ai
    Updated Apr 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nexdata (2024). Mixed Speech Data |5,000 Hours |Code-switching|Audio Data| Speech Recognition Data| AI Datasets [Dataset]. https://datarade.ai/data-products/nexdata-multilingual-code-switching-speech-data-5-000-hou-nexdata
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset updated
    Apr 23, 2024
    Dataset authored and provided by
    Nexdata
    Area covered
    Taiwan, Germany, Italy, New Zealand, Korea (Republic of), Hong Kong, Tunisia, France, Algeria, Mexico
    Description
    1. Specifications Format : 16kHz, 16bit, uncompressed wav, mono channel

    Recording environment : quiet indoor environment, without echo Recording content (read speech) : general category; human-machine interaction category

    Demographics : Speakers are evenly distributed across all age groups, covering children, teenagers, middle-aged, elderly, etc.

    Device : Android mobile phone, iPhone;

    Language : English-Korean, English-Japanese, German-English, Hong Kong Cantonese-English, Taiwanese-English,

    Application scenarios : speech recognition; voiceprint recognition.

    Accuracy rate : 97%

    1. About Nexdata Nexdata owns off-the-shelf PB-level Large Language Model(LLM) Data, 1 million hours of Audio Data and 800TB of Annotated Imagery Data. These ready-to-go Natural Language Processing (NLP) Data support instant delivery, quickly improve the accuracy of AI models. For more details, please visit us at https://www.nexdata.ai/datasets/speechrecog?source=Datarade
  13. P

    MixEval Dataset

    • paperswithcode.com
    Updated May 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jinjie Ni; Fuzhao Xue; Xiang Yue; Yuntian Deng; Mahir Shah; Kabir Jain; Graham Neubig; Yang You (2025). MixEval Dataset [Dataset]. https://paperswithcode.com/dataset/mixeval
    Explore at:
    Dataset updated
    May 18, 2025
    Authors
    Jinjie Ni; Fuzhao Xue; Xiang Yue; Yuntian Deng; Mahir Shah; Kabir Jain; Graham Neubig; Yang You
    Description

    MixEval is a ground-truth-based dynamic benchmark derived from off-the-shelf benchmark mixtures, which evaluates LLMs with a highly capable model ranking (i.e., 0.96 correlation with Chatbot Arena) while running locally and quickly (6% the time and cost of running MMLU), with its queries being stably and effortlessly updated every month to avoid contamination.

  14. d

    Unscripted Call Center Telephony Speech Data | 20,000 Hours |Speech...

    • datarade.ai
    Updated Feb 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nexdata (2025). Unscripted Call Center Telephony Speech Data | 20,000 Hours |Speech Recognition Data| Speech AI Datasets [Dataset]. https://datarade.ai/data-products/unscripted-call-center-telephony-speech-data-20-000-hours-nexdata
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset updated
    Feb 26, 2025
    Dataset authored and provided by
    Nexdata
    Area covered
    Netherlands, Uruguay, Luxembourg, Canada, Chile, Macao, Australia, Brazil, South Africa, Denmark
    Description
    1. Overview Format: 8kHz 16bit, wav, mono channel

    Recording condition: Phone recording system, with low background noise (call center scenario)

    Recording content: Spontaneous inbound and outbound callings in typical domain, such as finance, real-estate, sale, health, insurance, telecom

    Language: English, German, French, Spanish, Italian, Portuguese, Korean, Japanese, Hindi, Arabic, Dutch, Swedish, Norwegian and etc.

    Features of annotation: Transcription text, timestamp, speaker ID, gender, noise, PII redacted Accuracy: Word Accuracy Rate (WAR) 98%

    1. About Nexdata Nexdata owns off-the-shelf PB-level Large Language Model(LLM) Data, 1 million hours of Audio Data and 800TB of Annotated Imagery Data. These ready-to-go Machine Learning (ML) Data support instant delivery, quickly improve the accuracy of AI models. For more details, please visit us at https://www.nexdata.ai/datasets/speechrecog?source=Datarade
  15. Re-ID Data | 600,000 ID | CCTV Data |Computer Vision Data| Identity Data| AI...

    • datarade.ai
    Updated Dec 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nexdata (2023). Re-ID Data | 600,000 ID | CCTV Data |Computer Vision Data| Identity Data| AI Datasets [Dataset]. https://datarade.ai/data-products/nexdata-re-id-data-60-000-id-image-video-ai-ml-train-nexdata
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset updated
    Dec 8, 2023
    Dataset authored and provided by
    Nexdata
    Area covered
    Luxembourg, Ecuador, Sri Lanka, United Arab Emirates, Russian Federation, Trinidad and Tobago, Turkmenistan, Bolivia (Plurinational State of), Portugal, Cuba
    Description
    1. Specifications Data size : 60,000 ID

    Population distribution : the race distribution is Asians, Caucasians and black people, the gender distribution is male and female, the age distribution is from children to the elderly

    Collecting environment : including indoor and outdoor scenes (such as supermarket, mall and residential area, etc.)

    Data diversity : different ages, different time periods, different cameras, different human body orientations and postures, different ages collecting environment

    Device : surveillance cameras, the image resolution is not less than 1,9201,080

    Data format : the image data format is .jpg, the annotation file format is .json

    Annotation content : human body rectangular bounding boxes, 15 human body attributes

    Quality Requirements : A rectangular bounding box of human body is qualified when the deviation is not more than 3 pixels, and the qualified rate of the bounding boxes shall not be lower than 97%;Annotation accuracy of attributes is over 97%

    1. About Nexdata Nexdata owns off-the-shelf PB-level Large Language Model(LLM) Data, 1 million hours of Audio Data and 800TB of Annotated Imagery Data.These ready-to-go Identity Data support instant delivery, quickly improve the accuracy of AI models. For more details, please visit us at https://www.nexdata.ai/datasets/computervision?source=Datarade
  16. d

    Driver & Passenger Behavior Data | 100,000 ID | DMS & OMS Data| Image AI...

    • datarade.ai
    Updated Dec 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nexdata (2023). Driver & Passenger Behavior Data | 100,000 ID | DMS & OMS Data| Image AI Training Data | Annotated Imagery Data| AI Datasets [Dataset]. https://datarade.ai/data-products/nexdata-driver-passenger-behavior-data-100-000-id-im-nexdata
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset updated
    Dec 8, 2023
    Dataset authored and provided by
    Nexdata
    Area covered
    Armenia, Greece, Uzbekistan, Trinidad and Tobago, Cuba, Serbia, Japan, Korea (Republic of), Denmark, Malta
    Description
    1. Specifications Data size : 100,000 id

    Population distribution : gender distribution: balance gender; race distribution: Caucasians,blacks,Indians,Asians; age distribution: aged from 18 to 60

    Collection environment : In-car Cameras

    Collection diversity : multiple races, multiple age periods, multiple time periods and behaviors (Dangerous behavior, Fatigue behavior, Visual movement behavior)

    Device : binocular camera of RGB and infrared channels, the resolutions are 640x480

    Collection time : day, evening and night

    Image parameter : the video format is .avi

    Accuracy : according to the accuracy of each person's action, the accuracy is greater than 95%; the accuracy of label annotation is not less than 95%

    1. About Nexdata Nexdata owns off-the-shelf PB-level Large Language Model(LLM) Data, 1 million hours of Audio Data and 800TB of Annotated Imagery Data. These ready-to-go Annotated Imagery Data supports instant delivery, quickly improve the accuracy of AI models. For more details, please visit us at https://www.nexdata.ai/datasets/computervision?source=Datarade
  17. Multi-race Online Conference Video Data | 20000 ID | Annotated Imagery Data

    • datarade.ai
    Updated Jan 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nexdata (2025). Multi-race Online Conference Video Data | 20000 ID | Annotated Imagery Data [Dataset]. https://datarade.ai/data-products/nexdata-multi-race-online-conference-video-data-20000-id-nexdata
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset updated
    Jan 2, 2025
    Dataset authored and provided by
    Nexdata
    Area covered
    Oman, Australia, Kyrgyzstan, Cambodia, Ireland, Brazil, Singapore, Paraguay, France, Albania
    Description
    1. Specifications

    Data size : 20,000 ID

    Race distribution : Asian, Caucasian, Black, Brown

    Gender distribution : male, female

    Age distribution : from teenagers to the elderly, mainly young and middle-aged

    Collection environment : indoor office scenes, such as meeting rooms, coffee shops, libraries, bedrooms, etc.

    Collection diversity : diverse coverage of races, age groups and scenes

    Collection equipment : cellphone, using the cellphone to simulate the perspective of the laptop camera in online conference scenes

    Data format : .mp4, .mov

    Accuracy rate : the accuracy exceeds 97%

    1. About Nexdata Nexdata owns off-the-shelf PB-level Large Language Model(LLM) Data, 1 million hours of speech data and 800TB of Annotated Imagery Data. These ready-to-go data supports instant delivery, quickly improve the accuracy of AI models. For more details, please visit us at https://www.nexdata.ai/datasets/computervision?source=Datarade
  18. d

    Face Anti-spoofing Data | 200,000 ID | iBeta Dataset| Liveness Detection...

    • datarade.ai
    Updated Dec 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nexdata (2023). Face Anti-spoofing Data | 200,000 ID | iBeta Dataset| Liveness Detection Data| Image/Video Machine Learning (ML) Data| AI Datasets [Dataset]. https://datarade.ai/data-products/nexdata-face-anti-spoofing-data-200-000-id-image-video-nexdata
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset updated
    Dec 21, 2023
    Dataset authored and provided by
    Nexdata
    Area covered
    Dominican Republic, Malta, Slovakia, United Kingdom, Tunisia, Colombia, Hungary, Iraq, Pakistan, South Africa
    Description
    1. Specifications Data size : 200,000 ID

    Population distribution : race distribution: Asians, Caucasians, black people; gender distribution: gender balance; age distribution: from child to the elderly, the young people and the middle aged are the majorities

    Collection environment : indoor scenes, outdoor scenes

    Collection diversity : various postures, expressions, light condition, scenes, time periods and distances

    Collection device : iPhone, android phone, iPad

    Collection time : daytime,night

    Image Parameter : the video format is .mov or .mp4, the image format is .jpg

    Accuracy : the accuracy of actions exceeds 97%

    1. About Nexdata Nexdata owns off-the-shelf PB-level Large Language Model(LLM) Data, 1 million hours of Audio Data and 800TB of Annotated Imagery Data. These ready-to-go machine learning (ML) data supports instant delivery, quickly improve the accuracy of AI models. For more details, please visit us at https://www.nexdata.ai/datasets/computervision?source=Datarade
  19. Native & Accented English Speech Data |40,000 Hours | Audio Data|Speech...

    • datarade.ai
    Updated Jun 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nexdata (2024). Native & Accented English Speech Data |40,000 Hours | Audio Data|Speech Recognition Data| Natural Language Processing (NLP) Data [Dataset]. https://datarade.ai/data-products/nexdata-multilingual-native-accented-english-speech-data-nexdata
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset updated
    Jun 25, 2024
    Dataset authored and provided by
    Nexdata
    Area covered
    Morocco, Turkey, Sweden, Myanmar, Egypt, Taiwan, Denmark, Pakistan, United Kingdom, Macao
    Description
    1. Specifications Format : 16kHz, 16bit, uncompressed wav, mono channel.

    Recording environment : quiet indoor environment, low background noise, without echo.

    Recording content (read speech) : generic category; human-machine interaction category; smart home command and control category; in-car command and control category; numbers.

    Demographics : Speakers are evenly distributed across all age groups, covering children, teenagers, middle-aged, elderly, etc.

    Device : Android mobile phone, iPhone.

    Language : American English, British English, Canadian English, Australian English, French English, German English, Spanish English, Italian English, Portuguese English, Russian English, Indian English, Japanese English, Korean English, Singaporean English and etc.

    Application scenarios : speech recognition; voiceprint recognition.

    1. About Nexdata Nexdata owns off-the-shelf PB-level Large Language Model(LLM) Data, 1 million hours of Audio Data and 800TB of Annotated Imagery Data. These ready-to-go Machine Learning (ML) Data support instant delivery, quickly improve the accuracy of AI models. For more details, please visit us at https://www.nexdata.ai/datasets/speechrecog?source=Datarade
  20. Real-world Casual Conversation and Monologue Speech Data | 20,000 Hours |...

    • datarade.ai
    Updated Jan 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nexdata (2025). Real-world Casual Conversation and Monologue Speech Data | 20,000 Hours | Spontaneous Speech |Audio Data [Dataset]. https://datarade.ai/data-products/nexdata-multilingual-casual-conversation-speech-data-20-0-nexdata
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset updated
    Jan 2, 2025
    Dataset authored and provided by
    Nexdata
    Area covered
    Japan, Belgium, Romania, Korea (Republic of), Russian Federation, Hong Kong, Thailand, Argentina, Italy, Canada
    Description
    1. Specifications Format: 16kHz, 16 bit, wav, mono channel;

    Recording environment: Low background noise;

    Recording content: Including live, variety-show, speech etc;

    Language: English,French, German, Japanese, Portugese, Dutch, Turkish, Korean, Vietnamese, Indonesian, Malay, Thai, Burmese, Arabic, etc.

    Features of annotation: Transcription text, timestamp, speaker ID, gender, noise

    Accuracy rate: Word Accuracy Rate (WAR) 98%

    1. About Nexdata Nexdata owns off-the-shelf PB-level Large Language Model(LLM) Data, 1 million hours of Audio Data and 800TB of Annotated Imagery Data. These ready-to-go data supports instant delivery, quickly improve the accuracy of AI models. For more details, please visit us at https://www.nexdata.ai/datasets/speechrecog?source=Datarade
  21. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Nexdata (2025). Image and Video Description Data | 1 PB | Multimodal Data | GenAI | LLM Data | Large Language Model(LLM) Data| AI Datasets [Dataset]. https://datarade.ai/data-products/nexdata-image-and-video-description-data-1-pb-multimoda-nexdata

Image and Video Description Data | 1 PB | Multimodal Data | GenAI | LLM Data | Large Language Model(LLM) Data| AI Datasets

Explore at:
.bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
Dataset updated
Jan 3, 2025
Dataset authored and provided by
Nexdata
Area covered
Belgium, Ecuador, Czech Republic, Canada, Netherlands, Finland, Malta, Mexico, United Arab Emirates, Israel
Description
  1. Image Description Data Data Size: 500 million pairs Image Type: generic scene(portrait, landscapes, animals,etc), human action, picture book, magazine, PPT&chart, App screenshot, and etc. Resolution: 4K+ Description Language: English, Spanish, Portuguese, French, Korean, German, Chinese, Japanese Description Length: text length is no less than 250 words Format: the image format is .jpg, the annotation format is .json, and the description format is .txt

  2. Video Description Data Data Size: 10 million pairs Image Type: generic scene(portrait, landscapes, animals,etc), ads, TV sports, documentaries Resolution: 1080p+ Description Language: English, Spanish, Portuguese, French, Korean, German, Chinese, Japanese Description Length: text length is no less than 250 words Format: .mp4,.mov,.avi and other common formats;.xlsx (annotation file format)

  3. About Nexdata Nexdata owns off-the-shelf PB-level Large Language Model(LLM) Data, 1 million hours of Audio Data and 800TB of Annotated Imagery Data. These ready-to-go data supports instant delivery, quickly improve the accuracy of AI models. For more details, please visit us at https://www.nexdata.ai/datasets/llm?source=Datarade

Search
Clear search
Close search
Google apps
Main menu