100+ datasets found

200 Million High-quality Image Data
m.nexdata.ai
Updated Apr 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nexdata (2025). 200 Million High-quality Image Data [Dataset]. https://m.nexdata.ai/datasets/computervision/1793
Explore at:
Dataset updated
Apr 7, 2025
Dataset authored and provided by
Nexdata
Variables measured
Data size, Image type, Data format, Data content, Image resolution
Description
This image database contains 200 million high-quality images that have undergone professional review. The resources are diverse in type, featuring high resolution and clarity, excellent color accuracy, and rich detail. All materials have been legally obtained through authorized channels, with clear indications of copyright ownership and usage authorization scope. The entire collection provides commercial-grade usage rights and has been granted permission for scientific research use, ensuring clear and traceable intellectual property attribution. The vast and high-quality image resources offer robust support for a wide range of applications, including research in the field of computer vision, training of image recognition algorithms, and sourcing materials for creative design, thereby facilitating efficient progress in related areas.
t
Yuhui Zhang, Brandon McKinzie, Zhe Gan, Vaishaal Shankar, Alexander Toshev...
service.tib.eu
Updated Dec 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Yuhui Zhang, Brandon McKinzie, Zhe Gan, Vaishaal Shankar, Alexander Toshev (2024). Dataset: High Quality Image-Text Pairs (HQITP). https://doi.org/10.57702/x0qiuh4s [Dataset]. https://service.tib.eu/ldmservice/dataset/high-quality-image-text-pairs--hqitp-
Explore at:
Dataset updated
Dec 16, 2024
Description
High Quality Image-Text Pairs (HQITP) dataset contains 134M high-quality image-caption pairs.
m
Data from: CQ100: A High-Quality Image Dataset for Color Quantization...
data.mendeley.com
Updated Dec 17, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
M. Emre Celebi (2024). CQ100: A High-Quality Image Dataset for Color Quantization Research [Dataset]. http://doi.org/10.17632/vw5ys9hfxw.4
Explore at:
Unique identifier
https://doi.org/10.17632/vw5ys9hfxw.4
Dataset updated
Dec 17, 2024
Authors
M. Emre Celebi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
CQ100 is a diverse and high-quality dataset of color images that can be used to develop, test, and compare color quantization algorithms. The dataset can also be used in other color image processing tasks, including filtering and segmentation.

If you find CQ100 useful, please cite the following publication: M. E. Celebi and M. L. Perez-Delgado, “CQ100: A High-Quality Image Dataset for Color Quantization Research,” Journal of Electronic Imaging, vol. 32, no. 3, 033019, 2023.

You may download the above publication free of charge from: https://www.spiedigitallibrary.org/journals/journal-of-electronic-imaging/volume-32/issue-3/033019/cq100--a-high-quality-image-dataset-for-color-quantization/10.1117/1.JEI.32.3.033019.full?SSO=1
Data from: High Resolution Water Quality Dataset of Chinese Lakes and...
figshare.com
txt
Updated Feb 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shilong Luan; Huixiao Pan; Ruoque Shen; Xiaosheng Xia; Hongtao Duan; Wenping Yuan; Jing Wei (2025). High Resolution Water Quality Dataset of Chinese Lakes and Reservoirs from 2000 to 2023 [Dataset]. http://doi.org/10.6084/m9.figshare.27626286.v2
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.27626286.v2
Dataset updated
Feb 24, 2025
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Shilong Luan; Huixiao Pan; Ruoque Shen; Xiaosheng Xia; Hongtao Duan; Wenping Yuan; Jing Wei
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
China
Description
The dataset includes monthly data of eight water quality parameters for lakes and reservoirs in China from 2000 to 2023. The data were simulated using random forest models, taking into account the impacts of climate, soil properties, and anthropogenic activities. These water quality parameters are pH, dissolved oxygen (DO; mg/L), total nitrogen (TN; mg/L), total phosphorus (TP; mg/L), permanganate index (CODMn; mg/L), turbidity (Tur; JTU), electrical conductivity (EC; S/m) and dissolved organic carbon (DOC; mg/L). The data is stored in CSV format, sorted by lake and reservoir, and each CSV file contains monthly water quality data for the lake or reservoir and corresponding coordinates.
d
High-quality diffusion-weighted imaging of Parkinsons disease
dknet.org
scicrunch.org
+1more
Updated Jan 29, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). High-quality diffusion-weighted imaging of Parkinsons disease [Dataset]. http://identifiers.org/RRID:SCR_014121
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_014121
Dataset updated
Jan 29, 2022
Description
A project which contains data and analysis pipelines for a set of 53 subjects in a cross-sectional Parkinsons disease (PD) study. The dataset contains diffusion-weighted images (DWI) of 27 PD patients and 26 age, sex, and education-matched control subjects. The DWIs were acquired with 120 unique gradient directions, b=1000 and b=2500 s/mm2, and isotropic 2.4 mm3 voxels. The acquisition used a twice-refocused spin echo sequence in order to avoid distortions induced by eddy currents.
P
BIG Dataset
paperswithcode.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ho Kei Cheng; Jihoon Chung; Yu-Wing Tai; Chi-Keung Tang, BIG Dataset [Dataset]. https://paperswithcode.com/dataset/big
Explore at:
Authors
Ho Kei Cheng; Jihoon Chung; Yu-Wing Tai; Chi-Keung Tang
Description
A high-resolution semantic segmentation dataset with 50 validation and 100 test objects. Image resolution in BIG ranges from 2048×1600 to 5000×3600. Every image in the dataset has been carefully labeled by a professional while keeping the same guidelines as PASCAL VOC 2012 without the void region.
P
PartImageNet Dataset
paperswithcode.com
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ju He; Shuo Yang; Shaokang Yang; Adam Kortylewski; Xiaoding Yuan; Jie-Neng Chen; Shuai Liu; Cheng Yang; Qihang Yu; Alan Yuille, PartImageNet Dataset [Dataset]. https://paperswithcode.com/dataset/partimagenet
Explore at:
Authors
Ju He; Shuo Yang; Shaokang Yang; Adam Kortylewski; Xiaoding Yuan; Jie-Neng Chen; Shuai Liu; Cheng Yang; Qihang Yu; Alan Yuille
Description
PartImageNet is a large, high-quality dataset with part segmentation annotations. It consists of 158 classes from ImageNet with approximately 24000 images. PartImageNet offers part-level annotations on a general set of classes with non-rigid, articulated objects, while having an order of magnitude larger size compared to existing datasets. It can be utilized in multiple vision tasks including but not limited to: Part Discovery, Semantic Segmentation, Few-shot Learning.
g
DIV2K High Resolution Images
gts.ai
json
Updated Jul 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GTS (2024). DIV2K High Resolution Images [Dataset]. https://gts.ai/dataset-download/div2k-high-resolution-images/
Explore at:
jsonAvailable download formats
Dataset updated
Jul 15, 2024
Dataset provided by
GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
Authors
GTS
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Explore the DIV2K Dataset, a comprehensive collection of 1000 high-resolution RGB images designed for NTIRE and PIRM challenges.
Mobile Icon | Mobile Screenshots Dataset
kaggle.com
Updated Jan 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DataCluster Labs (2025). Mobile Icon | Mobile Screenshots Dataset [Dataset]. https://www.kaggle.com/datasets/dataclusterlabs/mobile-icon-mobile-screenshots-dataset/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 30, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
DataCluster Labs
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Mobile Icon | Mobile Screenshot Dataset is a meticulously curated collection of 9,000+ high-quality mobile screenshots, categorized across 13 diverse application types. This dataset is designed to support AI/ML researchers, UI/UX analysts, and developers in advancing mobile interface understanding, image classification, and content recognition.

Each image has been manually reviewed and verified by computer vision professionals at DataCluster Labs, ensuring high-quality and reliable data for research and development purposes.

Categories Included

Technical Applications

Wallpapers

News & Magazines

Business & Finance

Agriculture

Entertainment and many more.

Potential Applications:

AI & ML model training (image classification, UI/UX analysis, OCR).

Mobile app usability and accessibility research.

Content recognition and recommendation systems.

The images in this dataset are exclusively owned by Data Cluster Labs and were not downloaded from the internet. To access a larger portion of the training dataset for research and commercial purposes, a license can be purchased. Contact us at sales@datacluster.ai Visit www.datacluster.ai to know more.
d
Increasing Access to High-Quality Early Childhood Education 2013
catalog.data.gov
datasets.ai
Updated Sep 27, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office for Civil Rights (OCR) (2024). Increasing Access to High-Quality Early Childhood Education 2013 [Dataset]. https://catalog.data.gov/dataset/increasing-access-to-high-quality-early-childhood-education-2013
Explore at:
Dataset updated
Sep 27, 2024
Dataset provided by
Office for Civil Rights (OCR)
Description
The President believes we need to equip every child with the skills and education they need to be on a clear path to a good job and the middle class. To ensure these opportunities are available to all, President Obama has put forward a comprehensive early learning proposal to build a strong foundation for success in the first five years of life. These investments will help close America's school readiness gap and ensure that America's children enter kindergarten ready to succeed.
P
High-Quality Invoice Images for OCR Dataset
paperswithcode.com
Updated Apr 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Freddy C. Chua; Nigel P. Duffy (2025). High-Quality Invoice Images for OCR Dataset [Dataset]. https://paperswithcode.com/dataset/high-quality-invoice-images-for-ocr
Explore at:
Dataset updated
Apr 28, 2025
Authors
Freddy C. Chua; Nigel P. Duffy
Description
dataset link : https://www.kaggle.com/datasets/osamahosamabdellatif/high-quality-invoice-images-for-ocr

Overview High-Quality Invoice Images for OCR is a curated dataset containing professionally scanned and digitally captured invoice documents. It is designed for training, fine-tuning, and evaluating OCR models, machine learning pipelines, and data extraction systems.

This dataset focuses on clean, structured invoices to simulate real-world scenarios in financial document automation.

What's Inside 📄 Variety of invoice templates from multiple industries (e.g., retail, manufacturing, services)

🖋️ Different currencies, tax formats, and layouts

📸 High-resolution scanned and photographed invoices

🏷️ Optional field annotations (e.g., invoice number, date, total amount, vendor name) for supervised training

Key Applications Training and fine-tuning OCR and Document AI models

Machine learning for structured and semi-structured data extraction

Intelligent Document Processing (IDP) and Robotic Process Automation (RPA)

Benchmarking table detection, key-value extraction, and layout analysis models

Why Use This Dataset? ✅ High-quality images optimized for OCR and data extraction tasks

✅ Real-world invoice variations to improve model robustness

✅ Ideal for machine learning workflows in finance, ERP, and accounting systems

✅ Supports rapid prototyping for invoice understanding models

Ideal For Researchers working on OCR and document understanding

Developers building invoice processing systems

Machine learning engineers fine-tuning models for data extraction

Startups and enterprises automating financial workflows
D
Data Labeling Market Report
datainsightsmarket.com
doc, pdf, ppt
Updated Mar 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Data Labeling Market Report [Dataset]. https://www.datainsightsmarket.com/reports/data-labeling-market-20383
Explore at:
doc, ppt, pdfAvailable download formats
Dataset updated
Mar 8, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The data labeling market is experiencing robust growth, projected to reach $3.84 billion in 2025 and maintain a Compound Annual Growth Rate (CAGR) of 28.13% from 2025 to 2033. This expansion is fueled by the increasing demand for high-quality training data across various sectors, including healthcare, automotive, and finance, which heavily rely on machine learning and artificial intelligence (AI). The surge in AI adoption, particularly in areas like autonomous vehicles, medical image analysis, and fraud detection, necessitates vast quantities of accurately labeled data. The market is segmented by sourcing type (in-house vs. outsourced), data type (text, image, audio), labeling method (manual, automatic, semi-supervised), and end-user industry. Outsourcing is expected to dominate the sourcing segment due to cost-effectiveness and access to specialized expertise. Similarly, image data labeling is likely to hold a significant share, given the visual nature of many AI applications. The shift towards automation and semi-supervised techniques aims to improve efficiency and reduce labeling costs, though manual labeling will remain crucial for tasks requiring high accuracy and nuanced understanding. Geographical distribution shows strong potential across North America and Europe, with Asia-Pacific emerging as a key growth region driven by increasing technological advancements and digital transformation. Competition in the data labeling market is intense, with a mix of established players like Amazon Mechanical Turk and Appen, alongside emerging specialized companies. The market's future trajectory will likely be shaped by advancements in automation technologies, the development of more efficient labeling techniques, and the increasing need for specialized data labeling services catering to niche applications. Companies are focusing on improving the accuracy and speed of data labeling through innovations in AI-powered tools and techniques. Furthermore, the rise of synthetic data generation offers a promising avenue for supplementing real-world data, potentially addressing data scarcity challenges and reducing labeling costs in certain applications. This will, however, require careful attention to ensure that the synthetic data generated is representative of real-world data to maintain model accuracy. This comprehensive report provides an in-depth analysis of the global data labeling market, offering invaluable insights for businesses, investors, and researchers. The study period covers 2019-2033, with 2025 as the base and estimated year, and a forecast period of 2025-2033. We delve into market size, segmentation, growth drivers, challenges, and emerging trends, examining the impact of technological advancements and regulatory changes on this rapidly evolving sector. The market is projected to reach multi-billion dollar valuations by 2033, fueled by the increasing demand for high-quality data to train sophisticated machine learning models. Recent developments include: September 2024: The National Geospatial-Intelligence Agency (NGA) is poised to invest heavily in artificial intelligence, earmarking up to USD 700 million for data labeling services over the next five years. This initiative aims to enhance NGA's machine-learning capabilities, particularly in analyzing satellite imagery and other geospatial data. The agency has opted for a multi-vendor indefinite-delivery/indefinite-quantity (IDIQ) contract, emphasizing the importance of annotating raw data be it images or videos—to render it understandable for machine learning models. For instance, when dealing with satellite imagery, the focus could be on labeling distinct entities such as buildings, roads, or patches of vegetation.October 2023: Refuel.ai unveiled a new platform, Refuel Cloud, and a specialized large language model (LLM) for data labeling. Refuel Cloud harnesses advanced LLMs, including its proprietary model, to automate data cleaning, labeling, and enrichment at scale, catering to diverse industry use cases. Recognizing that clean data underpins modern AI and data-centric software, Refuel Cloud addresses the historical challenge of human labor bottlenecks in data production. With Refuel Cloud, enterprises can swiftly generate the expansive, precise datasets they require in mere minutes, a task that traditionally spanned weeks.. Key drivers for this market are: Rising Penetration of Connected Cars and Advances in Autonomous Driving Technology, Advances in Big Data Analytics based on AI and ML. Potential restraints include: Rising Penetration of Connected Cars and Advances in Autonomous Driving Technology, Advances in Big Data Analytics based on AI and ML. Notable trends are: Healthcare is Expected to Witness Remarkable Growth.
GlobalHighPM₂.₅: Global Daily Seamless 1 km Ground-Level PM₂.₅ Dataset over...
zenodo.org
nc, pdf, zip
Updated May 23, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jing Wei; Jing Wei; Zhanqing Li; Alexei Lyapustin; Jun Wang; Oleg Dubovik; Joel Schwartz; Lin Sun; Chi Li; Song Liu; Tong Zhu; Zhanqing Li; Alexei Lyapustin; Jun Wang; Oleg Dubovik; Joel Schwartz; Lin Sun; Chi Li; Song Liu; Tong Zhu (2025). GlobalHighPM₂.₅: Global Daily Seamless 1 km Ground-Level PM₂.₅ Dataset over Land (2017–Present) [Dataset]. http://doi.org/10.5281/zenodo.10800980
Explore at:
nc, zip, pdfAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10800980
Dataset updated
May 23, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jing Wei; Jing Wei; Zhanqing Li; Alexei Lyapustin; Jun Wang; Oleg Dubovik; Joel Schwartz; Lin Sun; Chi Li; Song Liu; Tong Zhu; Zhanqing Li; Alexei Lyapustin; Jun Wang; Oleg Dubovik; Joel Schwartz; Lin Sun; Chi Li; Song Liu; Tong Zhu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Apr 11, 2022
Description
GlobalHighPM_2.5 is part of a series of long-term, seamless, global, high-resolution, and high-quality datasets of air pollutants over land (i.e., GlobalHighAirPollutants, GHAP). It is generated from big data sources (e.g., ground-based measurements, satellite remote sensing products, atmospheric reanalysis, and model simulations) using artificial intelligence, taking into account the spatiotemporal heterogeneity of air pollution.

This dataset contains input data, analysis codes, and generated dataset used for the following article. If you use the GlobalHighPM_2.5 dataset in your scientific research, please cite the following reference (Wei et al., NC, 2023):

Wei, J., Li, Z., Lyapustin, A., Wang, J., Dubovik, O., Schwartz, J., Sun, L., Li, C., Liu, S., and Zhu, T. First close insight into global daily gapless 1 km PM_2.5 pollution, variability, and health impact. Nature Communications, 2023, 14, 8349. https://doi.org/10.1038/s41467-023-43862-3

Input Data

Relevant raw data for each figure (compiled into a single sheet within an Excel document) in the manuscript.

Code

Relevant Python scripts for replicating and ploting the analysis results in the manuscript, as well as codes for converting data formats.

Generated Dataset

Here is the first big data-derived seamless (spatial coverage = 100%) daily, monthly, and yearly 1 km (i.e., D1K, M1K, and Y1K) global ground-level PM_2.5 dataset over land from 2017 to the present. This dataset exhibits high quality, with cross-validation coefficients of determination (CV-R²) of 0.91, 0.97, and 0.98, and root-mean-square errors (RMSEs) of 9.20, 4.15, and 2.77 µg m^-3 on the daily, monthly, and annual bases, respectively.

Due to data volume limitations,

all (including daily) data for the year 2022 is accessible at: GlobalHighPM2.5 (2022)

all (including daily) data for the year 2021 is accessible at: GlobalHighPM2.5 (2021)

all (including daily) data for the year 2020 is accessible at: GlobalHighPM2.5 (2020)

all (including daily) data for the year 2019 is accessible at: GlobalHighPM2.5 (2019)

all (including daily) data for the year 2018 is accessible at: GlobalHighPM2.5 (2018)

all (including daily) data for the year 2017 is accessible at: GlobalHighPM2.5 (2017)

continuously updated...

More GHAP datasets for different air pollutants are available at: https://weijing-rs.github.io/product.html
h
ScaleCap-450k
huggingface.co
Updated Jun 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Long Xing (2025). ScaleCap-450k [Dataset]. https://huggingface.co/datasets/long-xing1/ScaleCap-450k
Explore at:
Dataset updated
Jun 25, 2025
Authors
Long Xing
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
[Paper] https://arxiv.org/abs/2506.19848 [GitHub] https://github.com/Cooperx521/ScaleCap

ScaleCap450k-Hyper detailed and high quality image caption Dataset details

This dataset contains 450k image-caption pairs, where the captions are annotated using the ScaleCap pipeline. For more details, please refer to the paper. In collecting images for our dataset, we primarily focus on two aspects: diversity and richness of image content. Given that the ShareGPT4V-100k already… See the full description on the dataset page: https://huggingface.co/datasets/long-xing1/ScaleCap-450k.
T
GlobalHighPM2.5: Global high-resolution and high-quality ground-level PM2.5...
data.tpdc.ac.cn
tpdc.ac.cn
zip
Updated Mar 18, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jing WEI; Zhanqing LI (2024). GlobalHighPM2.5: Global high-resolution and high-quality ground-level PM2.5 dataset over land (2017-2022) [Dataset]. http://doi.org/10.5281/zenodo.6449740
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6449740
Dataset updated
Mar 18, 2024
Dataset provided by
TPDC
Authors
Jing WEI; Zhanqing LI
Area covered
Description
GlobalHighPM2.5 is one of the series of long-term, full-coverage, global high-resolution and high-quality datasets of ground-level air pollutants over land (i.e., GlobalHighAirPollutants, GHAP). It is generated from big data (e.g., ground-based measurements, satellite remote sensing products, atmospheric reanalysis, and model simulations) using artificial intelligence by considering the spatiotemporal heterogeneity of air pollution. The coefficient of determination R2 for cross validation with ten fold data is 0.91, and the root mean square error RMSE is 9.2 µ g/m3. The main scope covers the entire global land area, with a spatial resolution of 1 km and a temporal resolution of day, month, and year, measured in µg/m3. Attention: This dataset is recorded in Universal Time (UTC, GMT+0) and is continuously updated. If you need more data, please contact the author by email（ weijing_rs@163.com ; weijing@umd.edu ）. The data file contains four types of codes for converting NC to GeoTiff (Python, Matlab, IDL, and R languages) nc2geotiff codes.
THVD (Talking Head Video Dataset)
kaggle.com
Updated Apr 28, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LipSynthesis (2025). THVD (Talking Head Video Dataset) [Dataset]. https://www.kaggle.com/datasets/mariopd/talking-head-video-dataset-23k-identities
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 28, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
LipSynthesis
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
About

We provide a comprehensive talking-head video dataset with over 50,000+ videos, totaling more than 500 hours of footage and featuring 20,841 unique identities from around the world.

Distribution

Detailing the format, size, and structure of the dataset:

Data Volume:

-Total Size: 2.7TB

-Total Videos: 47,547

-Identities Covered: 20,841

-Resolution: 60% 4k(1980), 33% fullHD(1080)

-Formats: MP4

-Full-length videos with visible mouth movements in every frame.

-Minimum face size of 400 pixels.

-Video durations range from 20 seconds to 5 minutes.

-Faces have not been cut out, full screen videos including backgrounds.

Usage

This dataset is ideal for a variety of applications:

Face Recognition & Verification: Training and benchmarking facial recognition models.

Action Recognition: Identifying human activities and behaviors.

Re-Identification (Re-ID): Tracking identities across different videos and environments.

Deepfake Detection: Developing methods to detect manipulated videos.

Generative AI: Training high-resolution video generation models.

Lip Syncing Applications: Enhancing AI-driven lip-syncing models for dubbing and virtual avatars.

Background AI Applications: Developing AI models for automated background replacement, segmentation, and enhancement.

Coverage

Explaining the scope and coverage of the dataset:

Geographic Coverage: Worldwide

Time Range: Time range and size of the videos have been noted in the CSV file.

Demographics: Includes information about age, gender, ethnicity, format, resolution, and file size.

Languages Covered (Videos):

English: 23,038 videos

Portuguese: 1,346 videos

Spanish: 677 videos

Norwegian: 1,266 videos

Swedish: 1,056 videos

Korean: 848 videos

Polish: 1,807 videos

Indonesian: 1,163 videos

French: 1,102 videos

German: 1,276 videos

Japanese: 1,433 videos

Dutch: 1,666 videos

Indian: 1,163 videos

Czech: 590 videos

Chinese: 685 videos

Italian: 975 videos

Philipeans: 920 videos

Bulgaria: 340 videos

Romanian: 1144 videos

Arabic: 1691 videos

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F25697584%2F9886980daa5564aa1654f08f1265a16e%2Fgenders.svg?generation=1743586595494800&alt=media" alt=""> https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F25697584%2Fa507c24865a6c7ca74c617bbec9b0ab3%2Fgender1.svg?generation=1743586726981819&alt=media" alt=""> https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F25697584%2F7b1cb6a68030bd5ddd5ec35ae456f28b%2Fgender2.svg?generation=1743586742032222&alt=media" alt=""> https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F25697584%2Fac9f51471caa388b494c3190fce34438%2Fgender3.svg?generation=1743586754238882&alt=media" alt="">

Who Can Use It

List examples of intended users and their use cases:

Data Scientists: Training machine learning models for video-based AI applications.

Researchers: Studying human behavior, facial analysis, or video AI advancements.

Businesses: Developing facial recognition systems, video analytics, or AI-driven media applications.

Additional Notes

Ensure ethical usage and compliance with privacy regulations. The dataset’s quality and scale make it valuable for high-performance AI training. Potential preprocessing (cropping, downsampling) may be needed for different use cases. Dataset has not been completed yet and expands daily, please contact for most up to date CSV file. The dataset has been divided into 20GB zipped files and is hosted on a private server (with the option to upload to the cloud if needed). To verify the dataset's quality, please contact me for the full CSV file. I’d be happy to provide example videos selected by the potential buyer.
d
High Resolution Voyager 2 Images of Neptune's Moon, Triton
catalog.data.gov
data.usgs.gov
Updated Jul 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). High Resolution Voyager 2 Images of Neptune's Moon, Triton [Dataset]. https://catalog.data.gov/dataset/high-resolution-voyager-2-images-of-neptunes-moon-triton
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Description
We processed 41 Voyager 2 images of Neptune’s moon Triton with pixel scales < 2 km/pixel form their raw, compressed, archived state to more usable cloud-optimized geotiffs, which can easily be used within spatial analysis software such as GIS. Processing was done using the USGS’ ISIS software and included geometric and radiometric calibration, and the removal of image reseaux and corner markers (originally used for geometric calibration). The images were also photogrammetrically controlled relative to one another to improve their locations on the surface, which were initially inaccurate by up to 200 km. After performing a least squares bundle adjustment, the root mean square (RMS) uncertainty in image locations was 0.50, 0.52, and 0.51 pixels in latitude, longitude, and radius, respectively, with minimum and maximum residuals of -4.21 and +3.197 pixels, respectively. Each individual image was warped to an orthographic projection centered at 15o W and 18o N at the native image resolution. Because reseaux removal introduces interpolated data, two versions of each image are provided: a fully processed version with reseaux removed, and a partially processed version that retains the reseaux (i.e., no interpolation). We also generated a mosaic of Triton images that is spatially consistent with the entire dataset and provides context for the individual images (not every image is included in the mosaic). The mosaic uses the same orthographic projection as the individual images, but a consistent scale of 600 m/pixel. Three versions of the mosaic are included: a fully processed version (reseaux removed, some interpolated pixel values), a partially processed version (reseaux retained, no interpolation), and a fully processed but sharpened (high pass filter with 100% albedo add-back) to enhance surface features. This data release improves the usability and accessibility of this singular dataset and enables new scientific investigations of Triton.
T
United States - 61.5-Year High Quality Market (HQM) Corporate Bond Spot Rate...
tradingeconomics.com
csv, excel, json, xml
Updated Feb 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TRADING ECONOMICS (2020). United States - 61.5-Year High Quality Market (HQM) Corporate Bond Spot Rate [Dataset]. https://tradingeconomics.com/united-states/61-5-year-high-quality-market-hqm-corporate-bond-spot-rate-fed-data.html
Explore at:
csv, json, xml, excelAvailable download formats
Dataset updated
Feb 24, 2020
Dataset authored and provided by
TRADING ECONOMICS
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 1, 1976 - Dec 31, 2025
Area covered
United States
Description
United States - 61.5-Year High Quality Market (HQM) Corporate Bond Spot Rate was 5.95% in March of 2025, according to the United States Federal Reserve. Historically, United States - 61.5-Year High Quality Market (HQM) Corporate Bond Spot Rate reached a record high of 12.50 in June of 1984 and a record low of 3.09 in December of 2020. Trading Economics provides the current actual value, an historical data chart and related indicators for United States - 61.5-Year High Quality Market (HQM) Corporate Bond Spot Rate - last updated from the United States Federal Reserve on May of 2025.
High-Quality Wetlands
data-wi-dnr.opendata.arcgis.com
Updated Jul 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wisconsin Department of Natural Resources (2023). High-Quality Wetlands [Dataset]. https://data-wi-dnr.opendata.arcgis.com/datasets/high-quality-wetlands/explore
Explore at:
Dataset updated
Jul 26, 2023
Dataset authored and provided by
Wisconsin Department of Natural Resourceshttp://dnr.wi.gov/
Area covered

Description
High-Quality Wetland points displayed in the DNR Watershed Restoration and Protection Viewer. These are unique wetlands and those wetlands with least disturbed or reference conditions. Points represent a generalized area, for legal and privacy reasons. All points are in HUCs that fall mostly within Wisconsin.
d
High-resolution infrared color satellite cloud map - East Asia
data.gov.tw
json, xml
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Central Weather Administration Ministry of Transportation and Communications, High-resolution infrared color satellite cloud map - East Asia [Dataset]. https://data.gov.tw/en/datasets/8193
Explore at:
json, xmlAvailable download formats
Dataset authored and provided by
Central Weather Administration Ministry of Transportation and Communications
License
https://data.gov.tw/licensehttps://data.gov.tw/license
Area covered
East Asia, Asia
Description
High-resolution satellite cloud image data *Changes in download URL as of September 15, 2023, please switch by December 31, 2023, the old version link will expire after the deadline. For those who need to download a large amount of data, please apply for membership at the open platform for meteorological data: https://opendata.cwa.gov.tw/index

Facebook

Twitter

Click to copy link

Link copied

Cite

Nexdata (2025). 200 Million High-quality Image Data [Dataset]. https://m.nexdata.ai/datasets/computervision/1793

200 Million High-quality Image Data

Explore at:

Dataset updated

Apr 7, 2025

Dataset authored and provided by

Nexdata

Variables measured

Data size, Image type, Data format, Data content, Image resolution

Description

This image database contains 200 million high-quality images that have undergone professional review. The resources are diverse in type, featuring high resolution and clarity, excellent color accuracy, and rich detail. All materials have been legally obtained through authorized channels, with clear indications of copyright ownership and usage authorization scope. The entire collection provides commercial-grade usage rights and has been granted permission for scientific research use, ensuring clear and traceable intellectual property attribution. The vast and high-quality image resources offer robust support for a wide range of applications, including research in the field of computer vision, training of image recognition algorithms, and sourcing materials for creative design, thereby facilitating efficient progress in related areas.

Clear search

Close search

Google apps

Main menu

200 Million High-quality Image Data

Yuhui Zhang, Brandon McKinzie, Zhe Gan, Vaishaal Shankar, Alexander Toshev...

Data from: CQ100: A High-Quality Image Dataset for Color Quantization...

Data from: High Resolution Water Quality Dataset of Chinese Lakes and...

High-quality diffusion-weighted imaging of Parkinsons disease

BIG Dataset

PartImageNet Dataset

DIV2K High Resolution Images

Mobile Icon | Mobile Screenshots Dataset

Categories Included

Potential Applications:

Increasing Access to High-Quality Early Childhood Education 2013

High-Quality Invoice Images for OCR Dataset

Data Labeling Market Report

GlobalHighPM₂.₅: Global Daily Seamless 1 km Ground-Level PM₂.₅ Dataset over...

ScaleCap-450k

GlobalHighPM2.5: Global high-resolution and high-quality ground-level PM2.5...

THVD (Talking Head Video Dataset)

High Resolution Voyager 2 Images of Neptune's Moon, Triton

United States - 61.5-Year High Quality Market (HQM) Corporate Bond Spot Rate...

High-Quality Wetlands

High-resolution infrared color satellite cloud map - East Asia

200 Million High-quality Image Data