20 datasets found

d
Image and Video Description Data | 1 PB | Multimodal Data | GenAI | LLM Data...
datarade.ai
Updated Jan 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nexdata (2025). Image and Video Description Data | 1 PB | Multimodal Data | GenAI | LLM Data | Large Language Model(LLM) Data| AI Datasets [Dataset]. https://datarade.ai/data-products/nexdata-image-and-video-description-data-1-pb-multimoda-nexdata
Explore at:
.bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
Dataset updated
Jan 3, 2025
Dataset authored and provided by
Nexdata
Area covered
Belgium, Ecuador, Czech Republic, Canada, Netherlands, Finland, Malta, Mexico, United Arab Emirates, Israel
Description
Image Description Data Data Size: 500 million pairs Image Type: generic scene(portrait, landscapes, animals,etc), human action, picture book, magazine, PPT&chart, App screenshot, and etc. Resolution: 4K+ Description Language: English, Spanish, Portuguese, French, Korean, German, Chinese, Japanese Description Length: text length is no less than 250 words Format: the image format is .jpg, the annotation format is .json, and the description format is .txt

Video Description Data Data Size: 10 million pairs Image Type: generic scene(portrait, landscapes, animals,etc), ads, TV sports, documentaries Resolution: 1080p+ Description Language: English, Spanish, Portuguese, French, Korean, German, Chinese, Japanese Description Length: text length is no less than 250 words Format: .mp4,.mov,.avi and other common formats;.xlsx (annotation file format)

About Nexdata Nexdata owns off-the-shelf PB-level Large Language Model(LLM) Data, 1 million hours of Audio Data and 800TB of Annotated Imagery Data. These ready-to-go data supports instant delivery, quickly improve the accuracy of AI models. For more details, please visit us at https://www.nexdata.ai/datasets/llm?source=Datarade
Image and Video Description Data | 1 PB | Multimodal Data | GenAI | LLM Data...
data.nexdata.ai
Updated Feb 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nexdata (2025). Image and Video Description Data | 1 PB | Multimodal Data | GenAI | LLM Data | Large Language Model(LLM) Data| AI Datasets [Dataset]. https://data.nexdata.ai/products/nexdata-image-and-video-description-data-1-pb-multimoda-nexdata
Explore at:
Dataset updated
Feb 4, 2025
Dataset authored and provided by
Nexdata
Area covered
Poland, New Zealand, South Africa, Dominican Republic, Slovakia, Philippines, Albania, Norway, Australia, Luxembourg
Description
Off-the-shelf 1PB image and video description data covers multiple scenes, languages, and domains.
d
16kHz Conversational Speech Data | 35,000 Hours | Large Language Model(LLM)...
datarade.ai
Updated Dec 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nexdata (2023). 16kHz Conversational Speech Data | 35,000 Hours | Large Language Model(LLM) Data | Speech AI Datasets|Machine Learning (ML) Data [Dataset]. https://datarade.ai/data-products/nexdata-multilingual-conversational-speech-data-16khz-mob-nexdata
Explore at:
.bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
Dataset updated
Dec 9, 2023
Dataset authored and provided by
Nexdata
Area covered
Germany, Saudi Arabia, Ecuador, Canada, Malaysia, Austria, Vietnam, Korea (Republic of), Indonesia, Turkey
Description
Specifications Format : 16kHz 16bit, uncompressed wav, mono channel;

Environment : quiet indoor environment, without echo;

Recording content : No preset linguistic data，dozens of topics are specified, and the speakers make dialogue under those topics while the recording is performed;

Demographics : Speakers are evenly distributed across all age groups, covering children, teenagers, middle-aged, elderly, etc.

Annotation : annotating for the transcription text, speaker identification, gender and noise symbols;

Device : Android mobile phone, iPhone;

Language : 100+ Languages;

Application scenarios : speech recognition; voiceprint recognition;

Accuracy rate : the word accuracy rate is not less than 98%

About Nexdata Nexdata owns off-the-shelf PB-level Large Language Model(LLM) Data, 1 million hours of Audio Data and 800TB of Annotated Imagery Data. These ready-to-go Machine Learning (ML) Data support instant delivery, quickly improve the accuracy of AI models. For more details, please visit us at https://www.nexdata.ai/datasets/speechrecog?source=Datarade
16kHz Conversational Speech Data | 35,000 Hours | Large Language Model(LLM)...
data.nexdata.ai
Updated Aug 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nexdata (2024). 16kHz Conversational Speech Data | 35,000 Hours | Large Language Model(LLM) Data | Speech AI Datasets|Machine Learning (ML) Data [Dataset]. https://data.nexdata.ai/products/nexdata-multilingual-conversational-speech-data-16khz-mob-nexdata
Explore at:
Dataset updated
Aug 3, 2024
Dataset authored and provided by
Nexdata
Area covered
Malaysia, Italy, Bulgaria, Switzerland, Pakistan, Egypt, Hong Kong, Brazil, Ukraine, Syrian Arab Republic
Description
Nexdata has off-the-shelf 35,000 hours Machine Learning (ML) Data of 16kHz conversational speech, covering 100+ countries including English, German, French, Spanish, Italian, Portuguese, Korean, Japanese, Hindi, Russia and etc.
In-Cabin Speech Data | 15,000 Hours | AI Training Data | Speech Recognition...
datarade.ai
Updated Apr 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nexdata (2023). In-Cabin Speech Data | 15,000 Hours | AI Training Data | Speech Recognition Data | Audio Data |Natural Language Processing (NLP) Data [Dataset]. https://datarade.ai/data-products/nexdata-in-car-speech-data-15-000-hours-audio-ai-ml-t-nexdata
Explore at:
.bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
Dataset updated
Apr 23, 2024
Dataset authored and provided by
Nexdata
Area covered
Switzerland, Netherlands, Germany, Poland, Egypt, Argentina, Turkey, Austria, Romania, Russian Federation
Description
Specifications Format : Audio format: 48kHz, 16bit, uncompressed wav, mono channel; Vedio format: MP4

Recording Environment : In-car;1 quiet scene, 1 low noise scene, 3 medium noise scenes and 2 high noise scenes

Recording Content : It covers 5 fields: navigation field, multimedia field, telephone field, car control field and question and answer field; 500 sentences per people

Speaker : Speakers are evenly distributed across all age groups, covering children, teenagers, middle-aged, elderly, etc.

Device : High fidelity microphone; Binocular camera

Language : 20 languages

Transcription content : text

Accuracy rate : 98%

Application scenarios : speech recognition, Human-computer interaction; Natural language processing and text analysis; Visual content understanding, etc.

About Nexdata Nexdata owns off-the-shelf PB-level Large Language Model(LLM) Data, 1 million hours of Audio Data and 800TB of Annotated Imagery Data. These ready-to-go Natural Language Processing (NLP) Data support instant delivery, quickly improve the accuracy of AI models. For more details, please visit us at https://www.nexdata.ai/datasets/speechrecog?source=Datarade
n
Unsupervised Speech Data |1 Million Hours | Spontaneous Speech | LLM |...
data.nexdata.ai
Updated Feb 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nexdata (2025). Unsupervised Speech Data |1 Million Hours | Spontaneous Speech | LLM | Pre-training |Large Language Model(LLM) Data [Dataset]. https://data.nexdata.ai/products/nexdata-multilingual-unsupervised-speech-data-1-million-ho-nexdata
Explore at:
Dataset updated
Feb 13, 2025
Dataset authored and provided by
Nexdata
Area covered
France
Description
Off-the-shelf 1 million hours of Unsupervised speech dataset, covering 10+ languages(English, French, German, Japanese, Arabic, Mandarin and etc. , 100,000 hours each). The content covers dialogues or monologues in 28 common domains, such as daily vlogs, travel, podcast, technology, beauty, etc.
Scripted Monologues Speech Data | 65,000 Hours | Generative AI Audio Data|...
datarade.ai
Updated Dec 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nexdata (2023). Scripted Monologues Speech Data | 65,000 Hours | Generative AI Audio Data| Speech Recognition Data | Machine Learning (ML) Data [Dataset]. https://datarade.ai/data-products/nexdata-multilingual-read-speech-data-65-000-hours-aud-nexdata
Explore at:
.bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
Dataset updated
Dec 11, 2023
Dataset authored and provided by
Nexdata
Area covered
Chile, Luxembourg, Pakistan, Taiwan, France, Italy, Japan, Poland, Puerto Rico, Uruguay
Description
Specifications Format : 16kHz, 16bit, uncompressed wav, mono channel

Recording environment : quiet indoor environment, without echo

Recording content (read speech) : economy, entertainment, news, oral language, numbers, letters

Speaker : native speaker, gender balance

Device : Android mobile phone, iPhone

Language : 100+ languages

Transcription content : text, time point of speech data, 5 noise symbols, 5 special identifiers

Accuracy rate : 95% (the accuracy rate of noise symbols and other identifiers is not included)

Application scenarios : speech recognition, voiceprint recognition

About Nexdata Nexdata owns off-the-shelf PB-level Large Language Model(LLM) Data, 1 million hours of Audio Data and 800TB of Annotated Imagery Data. These ready-to-go Machine Learning (ML) Data support instant delivery, quickly improve the accuracy of AI models. For more details, please visit us at https://www.nexdata.ai/datasets/speechrecog?source=Datarade
h
SIFT-50M
huggingface.co
Updated Jun 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amazon AGI (2025). SIFT-50M [Dataset]. https://huggingface.co/datasets/amazon-agi/SIFT-50M
Explore at:
Dataset updated
Jun 1, 2025
Dataset authored and provided by
Amazon AGI
License
https://choosealicense.com/licenses/cdla-sharing-1.0/https://choosealicense.com/licenses/cdla-sharing-1.0/
Description
Dataset Card for SIFT-50M

SIFT-50M (Speech Instruction Fine-Tuning) is a 50-million-example dataset designed for instruction fine-tuning and pre-training of speech-text large language models (LLMs). It is built from publicly available speech corpora containing a total of 14K hours of speech and leverages LLMs and off-the-shelf expert models. The dataset spans five languages, covering diverse aspects of speech understanding and controllable speech generation instructions. SIFT-50M… See the full description on the dataset page: https://huggingface.co/datasets/amazon-agi/SIFT-50M.
d
Speech Synthesis Data | 400 Hours | TTS Data | Audio Data | AI Training...
datarade.ai
Updated Dec 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nexdata (2023). Speech Synthesis Data | 400 Hours | TTS Data | Audio Data | AI Training Data| AI Datasets [Dataset]. https://datarade.ai/data-products/nexdata-multilingual-speech-synthesis-data-400-hours-a-nexdata
Explore at:
.bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
Dataset updated
Dec 10, 2023
Dataset authored and provided by
Nexdata
Area covered
Colombia, Malaysia, Belgium, Philippines, Canada, Austria, Finland, Singapore, Hong Kong, Sweden
Description
Specifications Format : 44.1 kHz/48 kHz, 16bit/24bit, uncompressed wav, mono channel.

Recording environment : professional recording studio.

Recording content : general narrative sentences, interrogative sentences, etc.

Speaker : native speaker

Annotation Feature : word transcription, part-of-speech, phoneme boundary, four-level accents, four-level prosodic boundary.

Device : Microphone

Language : American English, British English, Japanese, French, Dutch, Catonese, Canadian French,Australian English, Italian, New Zealand English, Spanish, Mexican Spanish

Application scenarios : speech synthesis

Accuracy rate: Word transcription: the sentences accuracy rate is not less than 99%. Part-of-speech annotation: the sentences accuracy rate is not less than 98%. Phoneme annotation: the sentences accuracy rate is not less than 98% (the error rate of voiced and swallowed phonemes is not included, because the labelling is more subjective). Accent annotation: the word accuracy rate is not less than 95%. Prosodic boundary annotation: the sentences accuracy rate is not less than 97% Phoneme boundary annotation: the phoneme accuracy rate is not less than 95% (the error range of boundary is within 5%)

About Nexdata Nexdata owns off-the-shelf PB-level Large Language Model(LLM) Data, 1 million hours of Audio Data and 800TB of Annotated Imagery Data. These ready-to-go AI & ML Training Data support instant delivery, quickly improve the accuracy of AI models. For more details, please visit us at https://www.nexdata.ai/datasets/tts?source=Datarade
h
WebR-Pro-100k
huggingface.co
Updated Apr 26, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yuxin Jiang (2025). WebR-Pro-100k [Dataset]. https://huggingface.co/datasets/YuxinJiang/WebR-Pro-100k
Explore at:
Dataset updated
Apr 26, 2025
Authors
Yuxin Jiang
Description
Instruction-Tuning Data Synthesis from Scratch via Web Reconstruction (ACL 2025)

arXiv link: https://arxiv.org/abs/2504.15573Github: https://github.com/YJiangcm/WebR Leveraging an off-the-shelf LLM, WebR transforms raw web documents into high-quality instruction-response pairs. It strategically assigns each document as either an instruction or a response to trigger the process of web reconstruction. We released our generated datasets on Huggingface:

Dataset Generator Size… See the full description on the dataset page: https://huggingface.co/datasets/YuxinJiang/WebR-Pro-100k.
8kHz Conversational Speech Data | 15,000 Hours | Audio Data | Speech...
datarade.ai
Updated Dec 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nexdata (2023). 8kHz Conversational Speech Data | 15,000 Hours | Audio Data | Speech Recognition Data| Machine Learning (ML) Data [Dataset]. https://datarade.ai/data-products/nexdata-multilingual-conversational-speech-data-8khz-tele-nexdata
Explore at:
.bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
Dataset updated
Dec 10, 2023
Dataset authored and provided by
Nexdata
Area covered
Czech Republic, United Arab Emirates, Argentina, United States of America, Romania, Philippines, Poland, Vietnam, Singapore, Netherlands
Description
Specifications Format : 8kHz, 8bit, u-law/a-law pcm, mono channel;

Environment : quiet indoor environment, without echo;

Recording content : No preset linguistic data，dozens of topics are specified, and the speakers make dialogue under those topics while the recording is performed;

Demographics : Speakers are evenly distributed across all age groups, covering children, teenagers, middle-aged, elderly, etc.

Annotation : annotating for the transcription text, speaker identification, gender and noise symbols;

Device : Telephony recording system;

Language : 100+ Languages;

Application scenarios : speech recognition; voiceprint recognition;

Accuracy rate : the word accuracy rate is not less than 98%

About Nexdata Nexdata owns off-the-shelf PB-level Large Language Model(LLM) Data, 1 million hours of Audio Data and 800TB of Annotated Imagery Data. These ready-to-go Machine Learning (ML) Data support instant delivery, quickly improve the accuracy of AI models. For more details, please visit us at https://www.nexdata.ai/datasets/speechrecog?source=Datarade
d
Mixed Speech Data |5,000 Hours |Code-switching|Audio Data| Speech...
datarade.ai
Updated Apr 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nexdata (2024). Mixed Speech Data |5,000 Hours |Code-switching|Audio Data| Speech Recognition Data| AI Datasets [Dataset]. https://datarade.ai/data-products/nexdata-multilingual-code-switching-speech-data-5-000-hou-nexdata
Explore at:
.bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
Dataset updated
Apr 23, 2024
Dataset authored and provided by
Nexdata
Area covered
Taiwan, Germany, Italy, New Zealand, Korea (Republic of), Hong Kong, Tunisia, France, Algeria, Mexico
Description
Specifications Format : 16kHz, 16bit, uncompressed wav, mono channel

Recording environment : quiet indoor environment, without echo Recording content (read speech) : general category; human-machine interaction category

Demographics : Speakers are evenly distributed across all age groups, covering children, teenagers, middle-aged, elderly, etc.

Device : Android mobile phone, iPhone;

Language : English-Korean, English-Japanese, German-English, Hong Kong Cantonese-English, Taiwanese-English,

Application scenarios : speech recognition; voiceprint recognition.

Accuracy rate : 97%

About Nexdata Nexdata owns off-the-shelf PB-level Large Language Model(LLM) Data, 1 million hours of Audio Data and 800TB of Annotated Imagery Data. These ready-to-go Natural Language Processing (NLP) Data support instant delivery, quickly improve the accuracy of AI models. For more details, please visit us at https://www.nexdata.ai/datasets/speechrecog?source=Datarade
P
MixEval Dataset
paperswithcode.com
Updated May 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jinjie Ni; Fuzhao Xue; Xiang Yue; Yuntian Deng; Mahir Shah; Kabir Jain; Graham Neubig; Yang You (2025). MixEval Dataset [Dataset]. https://paperswithcode.com/dataset/mixeval
Explore at:
Dataset updated
May 18, 2025
Authors
Jinjie Ni; Fuzhao Xue; Xiang Yue; Yuntian Deng; Mahir Shah; Kabir Jain; Graham Neubig; Yang You
Description
MixEval is a ground-truth-based dynamic benchmark derived from off-the-shelf benchmark mixtures, which evaluates LLMs with a highly capable model ranking (i.e., 0.96 correlation with Chatbot Arena) while running locally and quickly (6% the time and cost of running MMLU), with its queries being stably and effortlessly updated every month to avoid contamination.
d
Unscripted Call Center Telephony Speech Data | 20,000 Hours |Speech...
datarade.ai
Updated Feb 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nexdata (2025). Unscripted Call Center Telephony Speech Data | 20,000 Hours |Speech Recognition Data| Speech AI Datasets [Dataset]. https://datarade.ai/data-products/unscripted-call-center-telephony-speech-data-20-000-hours-nexdata
Explore at:
.bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
Dataset updated
Feb 26, 2025
Dataset authored and provided by
Nexdata
Area covered
Netherlands, Uruguay, Luxembourg, Canada, Chile, Macao, Australia, Brazil, South Africa, Denmark
Description
Overview Format: 8kHz 16bit, wav, mono channel

Recording condition: Phone recording system, with low background noise (call center scenario)

Recording content: Spontaneous inbound and outbound callings in typical domain, such as finance, real-estate, sale, health, insurance, telecom

Language: English, German, French, Spanish, Italian, Portuguese, Korean, Japanese, Hindi, Arabic, Dutch, Swedish, Norwegian and etc.

Features of annotation: Transcription text, timestamp, speaker ID, gender, noise, PII redacted Accuracy: Word Accuracy Rate (WAR) 98%

About Nexdata Nexdata owns off-the-shelf PB-level Large Language Model(LLM) Data, 1 million hours of Audio Data and 800TB of Annotated Imagery Data. These ready-to-go Machine Learning (ML) Data support instant delivery, quickly improve the accuracy of AI models. For more details, please visit us at https://www.nexdata.ai/datasets/speechrecog?source=Datarade
Re-ID Data | 600,000 ID | CCTV Data |Computer Vision Data| Identity Data| AI...
datarade.ai
Updated Dec 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nexdata (2023). Re-ID Data | 600,000 ID | CCTV Data |Computer Vision Data| Identity Data| AI Datasets [Dataset]. https://datarade.ai/data-products/nexdata-re-id-data-60-000-id-image-video-ai-ml-train-nexdata
Explore at:
.bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
Dataset updated
Dec 8, 2023
Dataset authored and provided by
Nexdata
Area covered
Luxembourg, Ecuador, Sri Lanka, United Arab Emirates, Russian Federation, Trinidad and Tobago, Turkmenistan, Bolivia (Plurinational State of), Portugal, Cuba
Description
Specifications Data size : 60,000 ID

Population distribution : the race distribution is Asians, Caucasians and black people, the gender distribution is male and female, the age distribution is from children to the elderly

Collecting environment : including indoor and outdoor scenes (such as supermarket, mall and residential area, etc.)

Data diversity : different ages, different time periods, different cameras, different human body orientations and postures, different ages collecting environment

Device : surveillance cameras, the image resolution is not less than 1,9201,080

Data format : the image data format is .jpg, the annotation file format is .json

Annotation content : human body rectangular bounding boxes, 15 human body attributes

Quality Requirements : A rectangular bounding box of human body is qualified when the deviation is not more than 3 pixels, and the qualified rate of the bounding boxes shall not be lower than 97%;Annotation accuracy of attributes is over 97%

About Nexdata Nexdata owns off-the-shelf PB-level Large Language Model(LLM) Data, 1 million hours of Audio Data and 800TB of Annotated Imagery Data.These ready-to-go Identity Data support instant delivery, quickly improve the accuracy of AI models. For more details, please visit us at https://www.nexdata.ai/datasets/computervision?source=Datarade
d
Driver & Passenger Behavior Data | 100,000 ID | DMS & OMS Data| Image AI...
datarade.ai
Updated Dec 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nexdata (2023). Driver & Passenger Behavior Data | 100,000 ID | DMS & OMS Data| Image AI Training Data | Annotated Imagery Data| AI Datasets [Dataset]. https://datarade.ai/data-products/nexdata-driver-passenger-behavior-data-100-000-id-im-nexdata
Explore at:
.bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
Dataset updated
Dec 8, 2023
Dataset authored and provided by
Nexdata
Area covered
Armenia, Greece, Uzbekistan, Trinidad and Tobago, Cuba, Serbia, Japan, Korea (Republic of), Denmark, Malta
Description
Specifications Data size : 100,000 id

Population distribution : gender distribution: balance gender; race distribution: Caucasians,blacks,Indians,Asians; age distribution: aged from 18 to 60

Collection environment : In-car Cameras

Collection diversity : multiple races, multiple age periods, multiple time periods and behaviors (Dangerous behavior, Fatigue behavior, Visual movement behavior)

Device : binocular camera of RGB and infrared channels, the resolutions are 640x480

Collection time : day, evening and night

Image parameter : the video format is .avi

Accuracy : according to the accuracy of each person's action, the accuracy is greater than 95%; the accuracy of label annotation is not less than 95%

About Nexdata Nexdata owns off-the-shelf PB-level Large Language Model(LLM) Data, 1 million hours of Audio Data and 800TB of Annotated Imagery Data. These ready-to-go Annotated Imagery Data supports instant delivery, quickly improve the accuracy of AI models. For more details, please visit us at https://www.nexdata.ai/datasets/computervision?source=Datarade
Multi-race Online Conference Video Data | 20000 ID | Annotated Imagery Data
datarade.ai
Updated Jan 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nexdata (2025). Multi-race Online Conference Video Data | 20000 ID | Annotated Imagery Data [Dataset]. https://datarade.ai/data-products/nexdata-multi-race-online-conference-video-data-20000-id-nexdata
Explore at:
.bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
Dataset updated
Jan 2, 2025
Dataset authored and provided by
Nexdata
Area covered
Oman, Australia, Kyrgyzstan, Cambodia, Ireland, Brazil, Singapore, Paraguay, France, Albania
Description
Specifications

Data size : 20,000 ID

Race distribution : Asian, Caucasian, Black, Brown

Gender distribution : male, female

Age distribution : from teenagers to the elderly, mainly young and middle-aged

Collection environment : indoor office scenes, such as meeting rooms, coffee shops, libraries, bedrooms, etc.

Collection diversity : diverse coverage of races, age groups and scenes

Collection equipment : cellphone, using the cellphone to simulate the perspective of the laptop camera in online conference scenes

Data format : .mp4, .mov

Accuracy rate : the accuracy exceeds 97%

About Nexdata Nexdata owns off-the-shelf PB-level Large Language Model(LLM) Data, 1 million hours of speech data and 800TB of Annotated Imagery Data. These ready-to-go data supports instant delivery, quickly improve the accuracy of AI models. For more details, please visit us at https://www.nexdata.ai/datasets/computervision?source=Datarade
d
Face Anti-spoofing Data | 200,000 ID | iBeta Dataset| Liveness Detection...
datarade.ai
Updated Dec 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nexdata (2023). Face Anti-spoofing Data | 200,000 ID | iBeta Dataset| Liveness Detection Data| Image/Video Machine Learning (ML) Data| AI Datasets [Dataset]. https://datarade.ai/data-products/nexdata-face-anti-spoofing-data-200-000-id-image-video-nexdata
Explore at:
.bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
Dataset updated
Dec 21, 2023
Dataset authored and provided by
Nexdata
Area covered
Dominican Republic, Malta, Slovakia, United Kingdom, Tunisia, Colombia, Hungary, Iraq, Pakistan, South Africa
Description
Specifications Data size : 200,000 ID

Population distribution : race distribution: Asians, Caucasians, black people; gender distribution: gender balance; age distribution: from child to the elderly, the young people and the middle aged are the majorities

Collection environment : indoor scenes, outdoor scenes

Collection diversity : various postures, expressions, light condition, scenes, time periods and distances

Collection device : iPhone, android phone, iPad

Collection time : daytime,night

Image Parameter : the video format is .mov or .mp4, the image format is .jpg

Accuracy : the accuracy of actions exceeds 97%

About Nexdata Nexdata owns off-the-shelf PB-level Large Language Model(LLM) Data, 1 million hours of Audio Data and 800TB of Annotated Imagery Data. These ready-to-go machine learning (ML) data supports instant delivery, quickly improve the accuracy of AI models. For more details, please visit us at https://www.nexdata.ai/datasets/computervision?source=Datarade
Native & Accented English Speech Data |40,000 Hours | Audio Data|Speech...
datarade.ai
Updated Jun 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nexdata (2024). Native & Accented English Speech Data |40,000 Hours | Audio Data|Speech Recognition Data| Natural Language Processing (NLP) Data [Dataset]. https://datarade.ai/data-products/nexdata-multilingual-native-accented-english-speech-data-nexdata
Explore at:
.bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
Dataset updated
Jun 25, 2024
Dataset authored and provided by
Nexdata
Area covered
Morocco, Turkey, Sweden, Myanmar, Egypt, Taiwan, Denmark, Pakistan, United Kingdom, Macao
Description
Specifications Format : 16kHz, 16bit, uncompressed wav, mono channel.

Recording environment : quiet indoor environment, low background noise, without echo.

Recording content (read speech) : generic category; human-machine interaction category; smart home command and control category; in-car command and control category; numbers.

Demographics : Speakers are evenly distributed across all age groups, covering children, teenagers, middle-aged, elderly, etc.

Device : Android mobile phone, iPhone.

Language : American English, British English, Canadian English, Australian English, French English, German English, Spanish English, Italian English, Portuguese English, Russian English, Indian English, Japanese English, Korean English, Singaporean English and etc.

Application scenarios : speech recognition; voiceprint recognition.

About Nexdata Nexdata owns off-the-shelf PB-level Large Language Model(LLM) Data, 1 million hours of Audio Data and 800TB of Annotated Imagery Data. These ready-to-go Machine Learning (ML) Data support instant delivery, quickly improve the accuracy of AI models. For more details, please visit us at https://www.nexdata.ai/datasets/speechrecog?source=Datarade
Real-world Casual Conversation and Monologue Speech Data | 20,000 Hours |...
datarade.ai
Updated Jan 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nexdata (2025). Real-world Casual Conversation and Monologue Speech Data | 20,000 Hours | Spontaneous Speech |Audio Data [Dataset]. https://datarade.ai/data-products/nexdata-multilingual-casual-conversation-speech-data-20-0-nexdata
Explore at:
.bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
Dataset updated
Jan 2, 2025
Dataset authored and provided by
Nexdata
Area covered
Japan, Belgium, Romania, Korea (Republic of), Russian Federation, Hong Kong, Thailand, Argentina, Italy, Canada
Description
Specifications Format: 16kHz, 16 bit, wav, mono channel;

Recording environment: Low background noise;

Recording content: Including live, variety-show, speech etc;

Language: English,French, German, Japanese, Portugese, Dutch, Turkish, Korean, Vietnamese, Indonesian, Malay, Thai, Burmese, Arabic, etc.

Features of annotation: Transcription text, timestamp, speaker ID, gender, noise

Accuracy rate: Word Accuracy Rate (WAR) 98%

About Nexdata Nexdata owns off-the-shelf PB-level Large Language Model(LLM) Data, 1 million hours of Audio Data and 800TB of Annotated Imagery Data. These ready-to-go data supports instant delivery, quickly improve the accuracy of AI models. For more details, please visit us at https://www.nexdata.ai/datasets/speechrecog?source=Datarade
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Image and Video Description Data | 1 PB | Multimodal Data | GenAI | LLM Data | Large Language Model(LLM) Data| AI Datasets

Explore at:

.bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats

Dataset updated

Jan 3, 2025

Dataset authored and provided by

Nexdata

Area covered

Belgium, Ecuador, Czech Republic, Canada, Netherlands, Finland, Malta, Mexico, United Arab Emirates, Israel

Description

Image Description Data Data Size: 500 million pairs Image Type: generic scene(portrait, landscapes, animals,etc), human action, picture book, magazine, PPT&chart, App screenshot, and etc. Resolution: 4K+ Description Language: English, Spanish, Portuguese, French, Korean, German, Chinese, Japanese Description Length: text length is no less than 250 words Format: the image format is .jpg, the annotation format is .json, and the description format is .txt
Video Description Data Data Size: 10 million pairs Image Type: generic scene(portrait, landscapes, animals,etc), ads, TV sports, documentaries Resolution: 1080p+ Description Language: English, Spanish, Portuguese, French, Korean, German, Chinese, Japanese Description Length: text length is no less than 250 words Format: .mp4,.mov,.avi and other common formats;.xlsx (annotation file format)
About Nexdata Nexdata owns off-the-shelf PB-level Large Language Model(LLM) Data, 1 million hours of Audio Data and 800TB of Annotated Imagery Data. These ready-to-go data supports instant delivery, quickly improve the accuracy of AI models. For more details, please visit us at https://www.nexdata.ai/datasets/llm?source=Datarade

Clear search

Close search

Google apps

Main menu

Image and Video Description Data | 1 PB | Multimodal Data | GenAI | LLM Data...

Image and Video Description Data | 1 PB | Multimodal Data | GenAI | LLM Data...

16kHz Conversational Speech Data | 35,000 Hours | Large Language Model(LLM)...

16kHz Conversational Speech Data | 35,000 Hours | Large Language Model(LLM)...

In-Cabin Speech Data | 15,000 Hours | AI Training Data | Speech Recognition...

Unsupervised Speech Data |1 Million Hours | Spontaneous Speech | LLM |...

Scripted Monologues Speech Data | 65,000 Hours | Generative AI Audio Data|...

SIFT-50M

Speech Synthesis Data | 400 Hours | TTS Data | Audio Data | AI Training...

WebR-Pro-100k

8kHz Conversational Speech Data | 15,000 Hours | Audio Data | Speech...

Mixed Speech Data |5,000 Hours |Code-switching|Audio Data| Speech...

MixEval Dataset

Unscripted Call Center Telephony Speech Data | 20,000 Hours |Speech...

Re-ID Data | 600,000 ID | CCTV Data |Computer Vision Data| Identity Data| AI...

Driver & Passenger Behavior Data | 100,000 ID | DMS & OMS Data| Image AI...

Multi-race Online Conference Video Data | 20000 ID | Annotated Imagery Data

Face Anti-spoofing Data | 200,000 ID | iBeta Dataset| Liveness Detection...

Native & Accented English Speech Data |40,000 Hours | Audio Data|Speech...

Real-world Casual Conversation and Monologue Speech Data | 20,000 Hours |...

Image and Video Description Data | 1 PB | Multimodal Data | GenAI | LLM Data | Large Language Model(LLM) Data| AI Datasets