Unidata’s Brain MRI dataset offers unique MRI scans and radiologist reports, aiding AI in detecting and diagnosing brain pathologies
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Dataset comprises 199,955 images featuring 28,565 individuals displaying a variety of facial expressions. It is designed for research in emotion recognition and facial expression analysis across diverse races, genders, and ages.
By utilizing this dataset, researchers and developers can enhance their understanding of facial recognition technology and improve the accuracy of emotion classification systems. - Get the data
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F22059654%2F22472a4de7d505ff4962b7eaa14071bf%2F1.png?generation=1740432470830146&alt=media" alt="">
This dataset includes images that capture different emotions, such as happiness, sadness, surprise, anger, disgust, and fear, allowing researchers to develop and evaluate recognition algorithms and detection methods.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F22059654%2F8cfad327bf19d7f6fad22ae2cc021a5b%2FFrame%201%20(2).png?generation=1740432926933026&alt=media" alt="">
Researchers can leverage this dataset to explore various learning methods and algorithms aimed at improving emotion detection and facial expression recognition.
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Dataset includes 2,300+ individuals, contributing to a total of 53,800+ videos and 9,300+ images captured via webcams. It is designed to study social interactions and behaviors in various remote meetings, including video calls, video conferencing, and online meetings.
By leveraging this dataset, developers and researchers can enhance their understanding of human behavior in digital communication settings, contributing to advancements in technology and software designed for remote collaboration. - Get the data
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F22059654%2F5d15deaf6757f20132a06e256ce14618%2FFrame%201%20(9).png?generation=1743156643952762&alt=media" alt="">
Dataset boasts an impressive >97% accuracy in action recognition (including actions such as sitting, typing, and gesturing) and ≥97% precision in action labeling, making it a highly reliable resource for studying human behavior in webcam settings.
Researchers can utilize this dataset to explore the impacts of web cameras on social and professional interactions, as well as to study the security features and audio quality associated with video streams. The dataset is particularly valuable for examining the nuances of remote working and the challenges faced during video conferences, including issues related to video quality and camera usage.
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
The dataset contains 65,000+ photo of more than 5,000 people from 40 countries, making it a valuable resource for exploring and developing identity verification solutions. This collection serves as a valuable resource for researchers and developers working on biometric verification solutions, especially in areas like facial recognition and financial services.
By utilizing this dataset, researchers can develop more robust re-identification algorithms, a key factor in ensuring privacy and security in various applications. - Get the data
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F22059654%2F1014bc8e62e232cc2ecb28e7d8ccdc3c%2F.png?generation=1730863166146276&alt=media" alt="">
This dataset offers a opportunity to explore re-identification challenges by providing 13 selfies of individuals against diverse backgrounds with different lighting, paired with 2 ID photos from different document types.
Devices: Samsung M31, Infinix note11, Tecno Pop 7, Samsung A05, Iphone 15 Pro Max and other
Resolution: 1000 x 750 and higher
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F22059654%2F0f1a70b3b5056e2610f22499cac19c7f%2FFrame%20136.png?generation=1730588713101089&alt=media" alt="">
This dataset enables the development of more robust and reliable authentication systems, ultimately contributing to enhancing customer onboarding experiences by streamlining verification processes, minimizing fraud, and improving overall security measures for a wide range of services, including online platforms, financial institutions, and government agencies.
Train AI to understand Japanese with Unidata’s dataset, featuring diverse speech samples for better transcription accuracy
Unidata Spine MRI dataset provides comprehensive spinal scans, improving AI’s ability to detect and diagnose spinal conditions
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
This dataset contains over 100,000 passport photos from 100+ countries, making it a valuable resource for researchers and developers working on computer vision tasks related to passport verification, biometric identification, and document analysis. This dataset allows researchers and developers to train and evaluate their models without the ethical and legal concerns associated with using real passport data.
By leveraging this dataset, developers can build robust and efficient document processing algorithms, contributing significantly to advancements in computer vision and identity verification. - Get the data
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F22059654%2Faebcdc96f2160742bf8f5683e273aeec%2FFrame%20135.png?generation=1729689336288410&alt=media" alt="">
The dataset includes a wide variety of passport photos, showcasing various background colors. It is designed to help developers and researchers build and train machine learning models that can accurately detect and analyze passport photos.
Photos in the dataset: 1. With background 2. Without background
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F22059654%2Feb869c5ccd7f6615754a5e8954675d5a%2FFrame%20127.png?generation=1729604678859593&alt=media" alt="">
The dataset can facilitate the development of applications aimed at improving security measures in border control and immigration processes. By utilizing advanced algorithms trained on diverse passport images, authorities can enhance the accuracy and speed of identity verification, reducing the risk of fraudulent activities.
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Dataset comprises 3,100 images from 775 individuals, featuring male alopecia cases captured from two angles (front + top views) with corresponding segmentation masks. Designed for machine learning and deep learning models, this collection supports research in hair follicles analysis, hair density measurement, and scalp health evaluation.
By leveraging this dataset, researchers can improve learning algorithms for detecting hair disorders, evaluating hair restoration techniques, and training models for early diagnosis of alopecia. - Get the data
The inclusion of segmentation masks ensures accurate classification of hair textures, scalp conditions, and hair growth stages, making it invaluable for medical image analysis and AI-driven dermatology.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F27063537%2F94b27bc05225cb65e752c8df15803d84%2FFrame%203%20(1).png?generation=1751972352481372&alt=media" alt="">
Each participant contributes 4 images (original + mask pairs), enabling precise diagnosis of hair loss, hair thinning patterns, and treatment personalization.
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
The dataset consists of 98,000 videos and selfies from 170 countries, providing a foundation for developing robust security systems and facial recognition algorithms.
While the dataset itself doesn't contain spoofing attacks, it's a valuable resource for testing liveness detection system, allowing researchers to simulate attacks and evaluate how effectively their systems can distinguish between real faces and various forms of spoofing.
By utilizing this dataset, researchers can contribute to the development of advanced security solutions, enabling the safe and reliable use of biometric technologies for authentication and verification. - Get the data
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F22059654%2Fe46e401a5449bacce5f934aaea9bb06e%2FFrame%20155.png?generation=1730591437955112&alt=media" alt="">
The dataset offers a high-quality collection of videos and photos, including selfies taken with a range of popular smartphones, like iPhone, Xiaomi, Samsung, and more. The videos showcase individuals turning their heads in various directions, providing a natural range of movements for liveness detection training.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F22059654%2F8350718e93ee92840995405815739c61%2FFrame%20136%20(1).png?generation=1730591760432249&alt=media" alt="">
Furthermore, the dataset provides detailed metadata for each set, including information like gender, age, ethnicity, video resolution, duration, and frames per second. This rich metadata provides crucial context for analysis and model development.
Researchers can develop more accurate liveness detection algorithms, which is crucial for achieving the iBeta Level 2 certification, a benchmark for robust and reliable biometric systems that prevent fraud.
Unidata provides a Russian Speech Recognition dataset to train AI for seamless speech-to-text conversion
Unidata’s Italian Speech Recognition dataset refines AI models for better speech-to-text conversion and language comprehension
Unidata’s Infrared Face Recognition dataset for improving security systems and enhancing AI performance in low-light condition
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Licensed Plate - Character Recognition for LPR, ALPR and ANPR
The dataset features license plates from 32+ countries and includes 1,200,000+ images with OCR. It focuses on plate recognitions and related detection systems, providing detailed information on plate numbers, country, bbox labeling and other data as well as corresponding masks for recognition tasks - Get the data The dataset encompasses plate detection systems, cameras, and character recognition for accurate… See the full description on the dataset page: https://huggingface.co/datasets/UniDataPro/license-plate-detection.
Video dataset capturing diverse facial expressions and emotions from 1000+ people, suitable for emotion recognition AI training
This dataset contains the historical Unidata Internet Data Distribution (IDD) Global Observational Data that are derived from real-time Global Telecommunications System (GTS) reports distributed via the Unidata Internet Data Distribution System (IDD). Reports include surface station (SYNOP) reports at 3-hour intervals, upper air (RAOB) reports at 3-hour intervals, surface station (METAR) reports at 1-hour intervals, and marine surface (BUOY) reports at 1-hour intervals. Select variables found in all report types include pressure, temperature, wind speed, and wind direction. Data may be available at mandatory or significant levels from 1000 millibars to 1 millibar, and at surface levels. Online archives are populated daily with reports generated two days prior to the current date.
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
LLM Fine-Tuning Dataset - 4,000,000+ logs, 32 languages
The dataset contains over 4 million+ logs written in 32 languages and is tailored for LLM training. It includes log and response pairs from 3 models, and is designed for language models and instruction fine-tuning to achieve improved performance in various NLP tasks - Get the data
Models used for text generation:
GPT-3.5 GPT-4 Uncensored GPT Version (is not included inthe sample)
Languages in the… See the full description on the dataset page: https://huggingface.co/datasets/UniDataPro/llm-training-dataset.
LLM Text Generation dataset offers multilingual text samples from large language models, enriching AI’s natural language understanding
Unidata’s German Speech Recognition dataset enhances AI transcription, ensuring precise speech-to-text conversion and language understanding
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Italian Speech Dataset for recognition task
Dataset comprises 499 hours of telephone dialogues in Italian, collected from 670+ native speakers across various topics and domains, achieving an impressive 98% Word Accuracy Rate. It is designed for research in automatic speech recognition (ASR) systems. By utilizing this dataset, researchers and developers can advance their understanding and capabilities in natural language processing (NLP), speech recognition, and machine learning… See the full description on the dataset page: https://huggingface.co/datasets/UniDataPro/italian-speech-recognition-dataset.
Motherlode is a prototype system which serves real-time weather data via a number of different interfaces using software packages developed at Unidata. The primary object is to have a place to implement, experiment with, and demonstrate web data services technologies. Much of the realtime data available over the Unidata Internet Data Distribution (IDD) is available through the Motherlode THREDDS Data Server hosted at Unidata on motherlode.ucar.edu. Real-time weather data via WCS, OPeNDAP, HTTP, FTP protocols. Includes the output of NCEP weather forecast models, all US NEXRAD Level II and Level III data, GOES satellite imagery, surface and sounding observations from around the globe. Depending on the specific data type, the server has about 1 - 2 weeks of data. The data are typically available on the server within seconds of when they are available from the source.
Unidata’s Brain MRI dataset offers unique MRI scans and radiologist reports, aiding AI in detecting and diagnosing brain pathologies