100+ datasets found
  1. 200 Million High-quality Image Data

    • m.nexdata.ai
    Updated Apr 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nexdata (2025). 200 Million High-quality Image Data [Dataset]. https://m.nexdata.ai/datasets/computervision/1793
    Explore at:
    Dataset updated
    Apr 7, 2025
    Dataset authored and provided by
    Nexdata
    Variables measured
    Data size, Image type, Data format, Data content, Image resolution
    Description

    This image database contains 200 million high-quality images that have undergone professional review. The resources are diverse in type, featuring high resolution and clarity, excellent color accuracy, and rich detail. All materials have been legally obtained through authorized channels, with clear indications of copyright ownership and usage authorization scope. The entire collection provides commercial-grade usage rights and has been granted permission for scientific research use, ensuring clear and traceable intellectual property attribution. The vast and high-quality image resources offer robust support for a wide range of applications, including research in the field of computer vision, training of image recognition algorithms, and sourcing materials for creative design, thereby facilitating efficient progress in related areas.

  2. t

    Yuhui Zhang, Brandon McKinzie, Zhe Gan, Vaishaal Shankar, Alexander Toshev...

    • service.tib.eu
    Updated Dec 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Yuhui Zhang, Brandon McKinzie, Zhe Gan, Vaishaal Shankar, Alexander Toshev (2024). Dataset: High Quality Image-Text Pairs (HQITP). https://doi.org/10.57702/x0qiuh4s [Dataset]. https://service.tib.eu/ldmservice/dataset/high-quality-image-text-pairs--hqitp-
    Explore at:
    Dataset updated
    Dec 16, 2024
    Description

    High Quality Image-Text Pairs (HQITP) dataset contains 134M high-quality image-caption pairs.

  3. m

    Data from: CQ100: A High-Quality Image Dataset for Color Quantization...

    • data.mendeley.com
    Updated Dec 17, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    M. Emre Celebi (2024). CQ100: A High-Quality Image Dataset for Color Quantization Research [Dataset]. http://doi.org/10.17632/vw5ys9hfxw.4
    Explore at:
    Dataset updated
    Dec 17, 2024
    Authors
    M. Emre Celebi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    CQ100 is a diverse and high-quality dataset of color images that can be used to develop, test, and compare color quantization algorithms. The dataset can also be used in other color image processing tasks, including filtering and segmentation.

    If you find CQ100 useful, please cite the following publication: M. E. Celebi and M. L. Perez-Delgado, “CQ100: A High-Quality Image Dataset for Color Quantization Research,” Journal of Electronic Imaging, vol. 32, no. 3, 033019, 2023.

    You may download the above publication free of charge from: https://www.spiedigitallibrary.org/journals/journal-of-electronic-imaging/volume-32/issue-3/033019/cq100--a-high-quality-image-dataset-for-color-quantization/10.1117/1.JEI.32.3.033019.full?SSO=1

  4. Data from: High Resolution Water Quality Dataset of Chinese Lakes and...

    • figshare.com
    txt
    Updated Feb 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shilong Luan; Huixiao Pan; Ruoque Shen; Xiaosheng Xia; Hongtao Duan; Wenping Yuan; Jing Wei (2025). High Resolution Water Quality Dataset of Chinese Lakes and Reservoirs from 2000 to 2023 [Dataset]. http://doi.org/10.6084/m9.figshare.27626286.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Feb 24, 2025
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Shilong Luan; Huixiao Pan; Ruoque Shen; Xiaosheng Xia; Hongtao Duan; Wenping Yuan; Jing Wei
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    China
    Description

    The dataset includes monthly data of eight water quality parameters for lakes and reservoirs in China from 2000 to 2023. The data were simulated using random forest models, taking into account the impacts of climate, soil properties, and anthropogenic activities. These water quality parameters are pH, dissolved oxygen (DO; mg/L), total nitrogen (TN; mg/L), total phosphorus (TP; mg/L), permanganate index (CODMn; mg/L), turbidity (Tur; JTU), electrical conductivity (EC; S/m) and dissolved organic carbon (DOC; mg/L). The data is stored in CSV format, sorted by lake and reservoir, and each CSV file contains monthly water quality data for the lake or reservoir and corresponding coordinates.

  5. d

    High-quality diffusion-weighted imaging of Parkinsons disease

    • dknet.org
    • scicrunch.org
    • +1more
    Updated Jan 29, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). High-quality diffusion-weighted imaging of Parkinsons disease [Dataset]. http://identifiers.org/RRID:SCR_014121
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    A project which contains data and analysis pipelines for a set of 53 subjects in a cross-sectional Parkinsons disease (PD) study. The dataset contains diffusion-weighted images (DWI) of 27 PD patients and 26 age, sex, and education-matched control subjects. The DWIs were acquired with 120 unique gradient directions, b=1000 and b=2500 s/mm2, and isotropic 2.4 mm3 voxels. The acquisition used a twice-refocused spin echo sequence in order to avoid distortions induced by eddy currents.

  6. P

    BIG Dataset

    • paperswithcode.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ho Kei Cheng; Jihoon Chung; Yu-Wing Tai; Chi-Keung Tang, BIG Dataset [Dataset]. https://paperswithcode.com/dataset/big
    Explore at:
    Authors
    Ho Kei Cheng; Jihoon Chung; Yu-Wing Tai; Chi-Keung Tang
    Description

    A high-resolution semantic segmentation dataset with 50 validation and 100 test objects. Image resolution in BIG ranges from 2048×1600 to 5000×3600. Every image in the dataset has been carefully labeled by a professional while keeping the same guidelines as PASCAL VOC 2012 without the void region.

  7. P

    PartImageNet Dataset

    • paperswithcode.com
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ju He; Shuo Yang; Shaokang Yang; Adam Kortylewski; Xiaoding Yuan; Jie-Neng Chen; Shuai Liu; Cheng Yang; Qihang Yu; Alan Yuille, PartImageNet Dataset [Dataset]. https://paperswithcode.com/dataset/partimagenet
    Explore at:
    Authors
    Ju He; Shuo Yang; Shaokang Yang; Adam Kortylewski; Xiaoding Yuan; Jie-Neng Chen; Shuai Liu; Cheng Yang; Qihang Yu; Alan Yuille
    Description

    PartImageNet is a large, high-quality dataset with part segmentation annotations. It consists of 158 classes from ImageNet with approximately 24000 images. PartImageNet offers part-level annotations on a general set of classes with non-rigid, articulated objects, while having an order of magnitude larger size compared to existing datasets. It can be utilized in multiple vision tasks including but not limited to: Part Discovery, Semantic Segmentation, Few-shot Learning.

  8. g

    DIV2K High Resolution Images

    • gts.ai
    json
    Updated Jul 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GTS (2024). DIV2K High Resolution Images [Dataset]. https://gts.ai/dataset-download/div2k-high-resolution-images/
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Jul 15, 2024
    Dataset provided by
    GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
    Authors
    GTS
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Explore the DIV2K Dataset, a comprehensive collection of 1000 high-resolution RGB images designed for NTIRE and PIRM challenges.

  9. Mobile Icon | Mobile Screenshots Dataset

    • kaggle.com
    Updated Jan 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DataCluster Labs (2025). Mobile Icon | Mobile Screenshots Dataset [Dataset]. https://www.kaggle.com/datasets/dataclusterlabs/mobile-icon-mobile-screenshots-dataset/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 30, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    DataCluster Labs
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Mobile Icon | Mobile Screenshot Dataset is a meticulously curated collection of 9,000+ high-quality mobile screenshots, categorized across 13 diverse application types. This dataset is designed to support AI/ML researchers, UI/UX analysts, and developers in advancing mobile interface understanding, image classification, and content recognition.

    Each image has been manually reviewed and verified by computer vision professionals at DataCluster Labs, ensuring high-quality and reliable data for research and development purposes.

    Categories Included

    • Technical Applications
    • Wallpapers
    • News & Magazines
    • Business & Finance
    • Agriculture
    • Entertainment and many more.

    Potential Applications:

    • AI & ML model training (image classification, UI/UX analysis, OCR).
    • Mobile app usability and accessibility research.
    • Content recognition and recommendation systems.

    The images in this dataset are exclusively owned by Data Cluster Labs and were not downloaded from the internet. To access a larger portion of the training dataset for research and commercial purposes, a license can be purchased. Contact us at sales@datacluster.ai Visit www.datacluster.ai to know more.

  10. d

    Increasing Access to High-Quality Early Childhood Education 2013

    • catalog.data.gov
    • datasets.ai
    Updated Sep 27, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office for Civil Rights (OCR) (2024). Increasing Access to High-Quality Early Childhood Education 2013 [Dataset]. https://catalog.data.gov/dataset/increasing-access-to-high-quality-early-childhood-education-2013
    Explore at:
    Dataset updated
    Sep 27, 2024
    Dataset provided by
    Office for Civil Rights (OCR)
    Description

    The President believes we need to equip every child with the skills and education they need to be on a clear path to a good job and the middle class. To ensure these opportunities are available to all, President Obama has put forward a comprehensive early learning proposal to build a strong foundation for success in the first five years of life. These investments will help close America's school readiness gap and ensure that America's children enter kindergarten ready to succeed.

  11. P

    High-Quality Invoice Images for OCR Dataset

    • paperswithcode.com
    Updated Apr 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Freddy C. Chua; Nigel P. Duffy (2025). High-Quality Invoice Images for OCR Dataset [Dataset]. https://paperswithcode.com/dataset/high-quality-invoice-images-for-ocr
    Explore at:
    Dataset updated
    Apr 28, 2025
    Authors
    Freddy C. Chua; Nigel P. Duffy
    Description

    dataset link : https://www.kaggle.com/datasets/osamahosamabdellatif/high-quality-invoice-images-for-ocr

    Overview High-Quality Invoice Images for OCR is a curated dataset containing professionally scanned and digitally captured invoice documents. It is designed for training, fine-tuning, and evaluating OCR models, machine learning pipelines, and data extraction systems.

    This dataset focuses on clean, structured invoices to simulate real-world scenarios in financial document automation.

    What's Inside 📄 Variety of invoice templates from multiple industries (e.g., retail, manufacturing, services)

    🖋️ Different currencies, tax formats, and layouts

    📸 High-resolution scanned and photographed invoices

    🏷️ Optional field annotations (e.g., invoice number, date, total amount, vendor name) for supervised training

    Key Applications Training and fine-tuning OCR and Document AI models

    Machine learning for structured and semi-structured data extraction

    Intelligent Document Processing (IDP) and Robotic Process Automation (RPA)

    Benchmarking table detection, key-value extraction, and layout analysis models

    Why Use This Dataset? ✅ High-quality images optimized for OCR and data extraction tasks

    ✅ Real-world invoice variations to improve model robustness

    ✅ Ideal for machine learning workflows in finance, ERP, and accounting systems

    ✅ Supports rapid prototyping for invoice understanding models

    Ideal For Researchers working on OCR and document understanding

    Developers building invoice processing systems

    Machine learning engineers fine-tuning models for data extraction

    Startups and enterprises automating financial workflows

  12. D

    Data Labeling Market Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Mar 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Data Labeling Market Report [Dataset]. https://www.datainsightsmarket.com/reports/data-labeling-market-20383
    Explore at:
    doc, ppt, pdfAvailable download formats
    Dataset updated
    Mar 8, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The data labeling market is experiencing robust growth, projected to reach $3.84 billion in 2025 and maintain a Compound Annual Growth Rate (CAGR) of 28.13% from 2025 to 2033. This expansion is fueled by the increasing demand for high-quality training data across various sectors, including healthcare, automotive, and finance, which heavily rely on machine learning and artificial intelligence (AI). The surge in AI adoption, particularly in areas like autonomous vehicles, medical image analysis, and fraud detection, necessitates vast quantities of accurately labeled data. The market is segmented by sourcing type (in-house vs. outsourced), data type (text, image, audio), labeling method (manual, automatic, semi-supervised), and end-user industry. Outsourcing is expected to dominate the sourcing segment due to cost-effectiveness and access to specialized expertise. Similarly, image data labeling is likely to hold a significant share, given the visual nature of many AI applications. The shift towards automation and semi-supervised techniques aims to improve efficiency and reduce labeling costs, though manual labeling will remain crucial for tasks requiring high accuracy and nuanced understanding. Geographical distribution shows strong potential across North America and Europe, with Asia-Pacific emerging as a key growth region driven by increasing technological advancements and digital transformation. Competition in the data labeling market is intense, with a mix of established players like Amazon Mechanical Turk and Appen, alongside emerging specialized companies. The market's future trajectory will likely be shaped by advancements in automation technologies, the development of more efficient labeling techniques, and the increasing need for specialized data labeling services catering to niche applications. Companies are focusing on improving the accuracy and speed of data labeling through innovations in AI-powered tools and techniques. Furthermore, the rise of synthetic data generation offers a promising avenue for supplementing real-world data, potentially addressing data scarcity challenges and reducing labeling costs in certain applications. This will, however, require careful attention to ensure that the synthetic data generated is representative of real-world data to maintain model accuracy. This comprehensive report provides an in-depth analysis of the global data labeling market, offering invaluable insights for businesses, investors, and researchers. The study period covers 2019-2033, with 2025 as the base and estimated year, and a forecast period of 2025-2033. We delve into market size, segmentation, growth drivers, challenges, and emerging trends, examining the impact of technological advancements and regulatory changes on this rapidly evolving sector. The market is projected to reach multi-billion dollar valuations by 2033, fueled by the increasing demand for high-quality data to train sophisticated machine learning models. Recent developments include: September 2024: The National Geospatial-Intelligence Agency (NGA) is poised to invest heavily in artificial intelligence, earmarking up to USD 700 million for data labeling services over the next five years. This initiative aims to enhance NGA's machine-learning capabilities, particularly in analyzing satellite imagery and other geospatial data. The agency has opted for a multi-vendor indefinite-delivery/indefinite-quantity (IDIQ) contract, emphasizing the importance of annotating raw data be it images or videos—to render it understandable for machine learning models. For instance, when dealing with satellite imagery, the focus could be on labeling distinct entities such as buildings, roads, or patches of vegetation.October 2023: Refuel.ai unveiled a new platform, Refuel Cloud, and a specialized large language model (LLM) for data labeling. Refuel Cloud harnesses advanced LLMs, including its proprietary model, to automate data cleaning, labeling, and enrichment at scale, catering to diverse industry use cases. Recognizing that clean data underpins modern AI and data-centric software, Refuel Cloud addresses the historical challenge of human labor bottlenecks in data production. With Refuel Cloud, enterprises can swiftly generate the expansive, precise datasets they require in mere minutes, a task that traditionally spanned weeks.. Key drivers for this market are: Rising Penetration of Connected Cars and Advances in Autonomous Driving Technology, Advances in Big Data Analytics based on AI and ML. Potential restraints include: Rising Penetration of Connected Cars and Advances in Autonomous Driving Technology, Advances in Big Data Analytics based on AI and ML. Notable trends are: Healthcare is Expected to Witness Remarkable Growth.

  13. GlobalHighPM₂.₅: Global Daily Seamless 1 km Ground-Level PM₂.₅ Dataset over...

    • zenodo.org
    nc, pdf, zip
    Updated May 23, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jing Wei; Jing Wei; Zhanqing Li; Alexei Lyapustin; Jun Wang; Oleg Dubovik; Joel Schwartz; Lin Sun; Chi Li; Song Liu; Tong Zhu; Zhanqing Li; Alexei Lyapustin; Jun Wang; Oleg Dubovik; Joel Schwartz; Lin Sun; Chi Li; Song Liu; Tong Zhu (2025). GlobalHighPM₂.₅: Global Daily Seamless 1 km Ground-Level PM₂.₅ Dataset over Land (2017–Present) [Dataset]. http://doi.org/10.5281/zenodo.10800980
    Explore at:
    nc, zip, pdfAvailable download formats
    Dataset updated
    May 23, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Jing Wei; Jing Wei; Zhanqing Li; Alexei Lyapustin; Jun Wang; Oleg Dubovik; Joel Schwartz; Lin Sun; Chi Li; Song Liu; Tong Zhu; Zhanqing Li; Alexei Lyapustin; Jun Wang; Oleg Dubovik; Joel Schwartz; Lin Sun; Chi Li; Song Liu; Tong Zhu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Apr 11, 2022
    Description

    GlobalHighPM2.5 is part of a series of long-term, seamless, global, high-resolution, and high-quality datasets of air pollutants over land (i.e., GlobalHighAirPollutants, GHAP). It is generated from big data sources (e.g., ground-based measurements, satellite remote sensing products, atmospheric reanalysis, and model simulations) using artificial intelligence, taking into account the spatiotemporal heterogeneity of air pollution.

    This dataset contains input data, analysis codes, and generated dataset used for the following article. If you use the GlobalHighPM2.5 dataset in your scientific research, please cite the following reference (Wei et al., NC, 2023):

    Input Data

    Relevant raw data for each figure (compiled into a single sheet within an Excel document) in the manuscript.

    Code

    Relevant Python scripts for replicating and ploting the analysis results in the manuscript, as well as codes for converting data formats.

    Generated Dataset

    Here is the first big data-derived seamless (spatial coverage = 100%) daily, monthly, and yearly 1 km (i.e., D1K, M1K, and Y1K) global ground-level PM2.5 dataset over land from 2017 to the present. This dataset exhibits high quality, with cross-validation coefficients of determination (CV-R2) of 0.91, 0.97, and 0.98, and root-mean-square errors (RMSEs) of 9.20, 4.15, and 2.77 µg m-3 on the daily, monthly, and annual bases, respectively.

    Due to data volume limitations,

    all (including daily) data for the year 2022 is accessible at: GlobalHighPM2.5 (2022)

    all (including daily) data for the year 2021 is accessible at: GlobalHighPM2.5 (2021)

    all (including daily) data for the year 2020 is accessible at: GlobalHighPM2.5 (2020)

    all (including daily) data for the year 2019 is accessible at: GlobalHighPM2.5 (2019)

    all (including daily) data for the year 2018 is accessible at: GlobalHighPM2.5 (2018)

    all (including daily) data for the year 2017 is accessible at: GlobalHighPM2.5 (2017)

    continuously updated...

    More GHAP datasets for different air pollutants are available at: https://weijing-rs.github.io/product.html

  14. h

    ScaleCap-450k

    • huggingface.co
    Updated Jun 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Long Xing (2025). ScaleCap-450k [Dataset]. https://huggingface.co/datasets/long-xing1/ScaleCap-450k
    Explore at:
    Dataset updated
    Jun 25, 2025
    Authors
    Long Xing
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    [Paper] https://arxiv.org/abs/2506.19848 [GitHub] https://github.com/Cooperx521/ScaleCap

      ScaleCap450k-Hyper detailed and high quality image caption
    
    
    
    
    
      Dataset details
    

    This dataset contains 450k image-caption pairs, where the captions are annotated using the ScaleCap pipeline. For more details, please refer to the paper. In collecting images for our dataset, we primarily focus on two aspects: diversity and richness of image content. Given that the ShareGPT4V-100k already… See the full description on the dataset page: https://huggingface.co/datasets/long-xing1/ScaleCap-450k.

  15. T

    GlobalHighPM2.5: Global high-resolution and high-quality ground-level PM2.5...

    • data.tpdc.ac.cn
    • tpdc.ac.cn
    zip
    Updated Mar 18, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jing WEI; Zhanqing LI (2024). GlobalHighPM2.5: Global high-resolution and high-quality ground-level PM2.5 dataset over land (2017-2022) [Dataset]. http://doi.org/10.5281/zenodo.6449740
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 18, 2024
    Dataset provided by
    TPDC
    Authors
    Jing WEI; Zhanqing LI
    Area covered
    Description

    GlobalHighPM2.5 is one of the series of long-term, full-coverage, global high-resolution and high-quality datasets of ground-level air pollutants over land (i.e., GlobalHighAirPollutants, GHAP). It is generated from big data (e.g., ground-based measurements, satellite remote sensing products, atmospheric reanalysis, and model simulations) using artificial intelligence by considering the spatiotemporal heterogeneity of air pollution. The coefficient of determination R2 for cross validation with ten fold data is 0.91, and the root mean square error RMSE is 9.2 µ g/m3. The main scope covers the entire global land area, with a spatial resolution of 1 km and a temporal resolution of day, month, and year, measured in µg/m3. Attention: This dataset is recorded in Universal Time (UTC, GMT+0) and is continuously updated. If you need more data, please contact the author by email( weijing_rs@163.com ; weijing@umd.edu ). The data file contains four types of codes for converting NC to GeoTiff (Python, Matlab, IDL, and R languages) nc2geotiff codes.

  16. THVD (Talking Head Video Dataset)

    • kaggle.com
    Updated Apr 28, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LipSynthesis (2025). THVD (Talking Head Video Dataset) [Dataset]. https://www.kaggle.com/datasets/mariopd/talking-head-video-dataset-23k-identities
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 28, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    LipSynthesis
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    About

    We provide a comprehensive talking-head video dataset with over 50,000+ videos, totaling more than 500 hours of footage and featuring 20,841 unique identities from around the world.

    Distribution

    Detailing the format, size, and structure of the dataset:

    Data Volume:

    -Total Size: 2.7TB

    -Total Videos: 47,547

    -Identities Covered: 20,841

    -Resolution: 60% 4k(1980), 33% fullHD(1080)

    -Formats: MP4

    -Full-length videos with visible mouth movements in every frame.

    -Minimum face size of 400 pixels.

    -Video durations range from 20 seconds to 5 minutes.

    -Faces have not been cut out, full screen videos including backgrounds.

    Usage

    This dataset is ideal for a variety of applications:

    Face Recognition & Verification: Training and benchmarking facial recognition models.

    Action Recognition: Identifying human activities and behaviors.

    Re-Identification (Re-ID): Tracking identities across different videos and environments.

    Deepfake Detection: Developing methods to detect manipulated videos.

    Generative AI: Training high-resolution video generation models.

    Lip Syncing Applications: Enhancing AI-driven lip-syncing models for dubbing and virtual avatars.

    Background AI Applications: Developing AI models for automated background replacement, segmentation, and enhancement.

    Coverage

    Explaining the scope and coverage of the dataset:

    Geographic Coverage: Worldwide

    Time Range: Time range and size of the videos have been noted in the CSV file.

    Demographics: Includes information about age, gender, ethnicity, format, resolution, and file size.

    Languages Covered (Videos):

    English: 23,038 videos

    Portuguese: 1,346 videos

    Spanish: 677 videos

    Norwegian: 1,266 videos

    Swedish: 1,056 videos

    Korean: 848 videos

    Polish: 1,807 videos

    Indonesian: 1,163 videos

    French: 1,102 videos

    German: 1,276 videos

    Japanese: 1,433 videos

    Dutch: 1,666 videos

    Indian: 1,163 videos

    Czech: 590 videos

    Chinese: 685 videos

    Italian: 975 videos

    Philipeans: 920 videos

    Bulgaria: 340 videos

    Romanian: 1144 videos

    Arabic: 1691 videos

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F25697584%2F9886980daa5564aa1654f08f1265a16e%2Fgenders.svg?generation=1743586595494800&alt=media" alt=""> https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F25697584%2Fa507c24865a6c7ca74c617bbec9b0ab3%2Fgender1.svg?generation=1743586726981819&alt=media" alt=""> https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F25697584%2F7b1cb6a68030bd5ddd5ec35ae456f28b%2Fgender2.svg?generation=1743586742032222&alt=media" alt=""> https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F25697584%2Fac9f51471caa388b494c3190fce34438%2Fgender3.svg?generation=1743586754238882&alt=media" alt="">

    Who Can Use It

    List examples of intended users and their use cases:

    Data Scientists: Training machine learning models for video-based AI applications.

    Researchers: Studying human behavior, facial analysis, or video AI advancements.

    Businesses: Developing facial recognition systems, video analytics, or AI-driven media applications.

    Additional Notes

    Ensure ethical usage and compliance with privacy regulations. The dataset’s quality and scale make it valuable for high-performance AI training. Potential preprocessing (cropping, downsampling) may be needed for different use cases. Dataset has not been completed yet and expands daily, please contact for most up to date CSV file. The dataset has been divided into 20GB zipped files and is hosted on a private server (with the option to upload to the cloud if needed). To verify the dataset's quality, please contact me for the full CSV file. I’d be happy to provide example videos selected by the potential buyer.

  17. d

    High Resolution Voyager 2 Images of Neptune's Moon, Triton

    • catalog.data.gov
    • data.usgs.gov
    Updated Jul 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). High Resolution Voyager 2 Images of Neptune's Moon, Triton [Dataset]. https://catalog.data.gov/dataset/high-resolution-voyager-2-images-of-neptunes-moon-triton
    Explore at:
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Description

    We processed 41 Voyager 2 images of Neptune’s moon Triton with pixel scales < 2 km/pixel form their raw, compressed, archived state to more usable cloud-optimized geotiffs, which can easily be used within spatial analysis software such as GIS. Processing was done using the USGS’ ISIS software and included geometric and radiometric calibration, and the removal of image reseaux and corner markers (originally used for geometric calibration). The images were also photogrammetrically controlled relative to one another to improve their locations on the surface, which were initially inaccurate by up to 200 km. After performing a least squares bundle adjustment, the root mean square (RMS) uncertainty in image locations was 0.50, 0.52, and 0.51 pixels in latitude, longitude, and radius, respectively, with minimum and maximum residuals of -4.21 and +3.197 pixels, respectively. Each individual image was warped to an orthographic projection centered at 15o W and 18o N at the native image resolution. Because reseaux removal introduces interpolated data, two versions of each image are provided: a fully processed version with reseaux removed, and a partially processed version that retains the reseaux (i.e., no interpolation). We also generated a mosaic of Triton images that is spatially consistent with the entire dataset and provides context for the individual images (not every image is included in the mosaic). The mosaic uses the same orthographic projection as the individual images, but a consistent scale of 600 m/pixel. Three versions of the mosaic are included: a fully processed version (reseaux removed, some interpolated pixel values), a partially processed version (reseaux retained, no interpolation), and a fully processed but sharpened (high pass filter with 100% albedo add-back) to enhance surface features. This data release improves the usability and accessibility of this singular dataset and enables new scientific investigations of Triton.

  18. T

    United States - 61.5-Year High Quality Market (HQM) Corporate Bond Spot Rate...

    • tradingeconomics.com
    csv, excel, json, xml
    Updated Feb 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRADING ECONOMICS (2020). United States - 61.5-Year High Quality Market (HQM) Corporate Bond Spot Rate [Dataset]. https://tradingeconomics.com/united-states/61-5-year-high-quality-market-hqm-corporate-bond-spot-rate-fed-data.html
    Explore at:
    csv, json, xml, excelAvailable download formats
    Dataset updated
    Feb 24, 2020
    Dataset authored and provided by
    TRADING ECONOMICS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 1976 - Dec 31, 2025
    Area covered
    United States
    Description

    United States - 61.5-Year High Quality Market (HQM) Corporate Bond Spot Rate was 5.95% in March of 2025, according to the United States Federal Reserve. Historically, United States - 61.5-Year High Quality Market (HQM) Corporate Bond Spot Rate reached a record high of 12.50 in June of 1984 and a record low of 3.09 in December of 2020. Trading Economics provides the current actual value, an historical data chart and related indicators for United States - 61.5-Year High Quality Market (HQM) Corporate Bond Spot Rate - last updated from the United States Federal Reserve on May of 2025.

  19. High-Quality Wetlands

    • data-wi-dnr.opendata.arcgis.com
    Updated Jul 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wisconsin Department of Natural Resources (2023). High-Quality Wetlands [Dataset]. https://data-wi-dnr.opendata.arcgis.com/datasets/high-quality-wetlands/explore
    Explore at:
    Dataset updated
    Jul 26, 2023
    Dataset authored and provided by
    Wisconsin Department of Natural Resourceshttp://dnr.wi.gov/
    Area covered
    Description

    High-Quality Wetland points displayed in the DNR Watershed Restoration and Protection Viewer. These are unique wetlands and those wetlands with least disturbed or reference conditions. Points represent a generalized area, for legal and privacy reasons. All points are in HUCs that fall mostly within Wisconsin.

  20. d

    High-resolution infrared color satellite cloud map - East Asia

    • data.gov.tw
    json, xml
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Central Weather Administration Ministry of Transportation and Communications, High-resolution infrared color satellite cloud map - East Asia [Dataset]. https://data.gov.tw/en/datasets/8193
    Explore at:
    json, xmlAvailable download formats
    Dataset authored and provided by
    Central Weather Administration Ministry of Transportation and Communications
    License

    https://data.gov.tw/licensehttps://data.gov.tw/license

    Area covered
    East Asia, Asia
    Description

    High-resolution satellite cloud image data *Changes in download URL as of September 15, 2023, please switch by December 31, 2023, the old version link will expire after the deadline. For those who need to download a large amount of data, please apply for membership at the open platform for meteorological data: https://opendata.cwa.gov.tw/index

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Nexdata (2025). 200 Million High-quality Image Data [Dataset]. https://m.nexdata.ai/datasets/computervision/1793
Organization logo

200 Million High-quality Image Data

Explore at:
Dataset updated
Apr 7, 2025
Dataset authored and provided by
Nexdata
Variables measured
Data size, Image type, Data format, Data content, Image resolution
Description

This image database contains 200 million high-quality images that have undergone professional review. The resources are diverse in type, featuring high resolution and clarity, excellent color accuracy, and rich detail. All materials have been legally obtained through authorized channels, with clear indications of copyright ownership and usage authorization scope. The entire collection provides commercial-grade usage rights and has been granted permission for scientific research use, ensuring clear and traceable intellectual property attribution. The vast and high-quality image resources offer robust support for a wide range of applications, including research in the field of computer vision, training of image recognition algorithms, and sourcing materials for creative design, thereby facilitating efficient progress in related areas.

Search
Clear search
Close search
Google apps
Main menu