100+ datasets found
  1. Generative AI Applications

    • kaggle.com
    zip
    Updated Jul 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Niyamat Ullah (2023). Generative AI Applications [Dataset]. https://www.kaggle.com/datasets/niyamatalmass/generative-ai-applications
    Explore at:
    zip(82678 bytes)Available download formats
    Dataset updated
    Jul 16, 2023
    Authors
    Niyamat Ullah
    Description

    Dataset

    This dataset was created by Niyamat Ullah

    Contents

  2. Global Generative AI Tools Landscape 2025

    • kaggle.com
    zip
    Updated Sep 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Warda Bilal (2025). Global Generative AI Tools Landscape 2025 [Dataset]. https://www.kaggle.com/datasets/wardabilal/global-generative-ai-tools-landscape-2025
    Explore at:
    zip(3501 bytes)Available download formats
    Dataset updated
    Sep 27, 2025
    Authors
    Warda Bilal
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Data Description

    This dataset offers a thorough summary of the platforms and tools for generative AI that will be accessible through 2025. The tool names, businesses, release years, categories, modalities (text, image, video, audio, code, multimodal), open-source status, and API availability are all covered in detail. Researchers, students, data analysts, and AI enthusiasts can use the dataset to better understand the quickly expanding field of artificial intelligence.

    By using this dataset, you can:

    • Examine patterns in various AI fields and businesses.
    • Examine and contrast private and open-source AI solutions.
    • Examine how modalities and APIs are being adopted in generative AI.
    • Examine the possible effects on computer science research, business, employment, and careers.

    This dataset is a useful tool for academic and professional settings since it may be used in data analytics, business insights, career research, and AI innovation studies.

  3. Generative AI tool

    • kaggle.com
    zip
    Updated Sep 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ali Hussain (2025). Generative AI tool [Dataset]. https://www.kaggle.com/datasets/aliiihussain/generative-ai-tool
    Explore at:
    zip(3501 bytes)Available download formats
    Dataset updated
    Sep 27, 2025
    Authors
    Ali Hussain
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    This dataset provides a collection of popular Generative AI tools that are shaping the future of creativity, productivity, and automation at high level. It includes details such as tool name ,company, category, modality_canonical, open source, api_available, api_status, release year, category, (e.g., text, image, video, music, code), and key features.

  4. GenAI-Public-Response

    • kaggle.com
    Updated Nov 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SSRETRO (2023). GenAI-Public-Response [Dataset]. https://www.kaggle.com/datasets/ssretro/generative-ai-where-its-headed
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 27, 2023
    Dataset provided by
    Kaggle
    Authors
    SSRETRO
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset

    This dataset was created by SSRETRO

    Released under MIT

    Contents

  5. Google Generative AI Documentation

    • kaggle.com
    zip
    Updated Dec 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bhavik Jikadara (2023). Google Generative AI Documentation [Dataset]. https://www.kaggle.com/datasets/bhavikjikadara/google-generative-ai-documentation
    Explore at:
    zip(415 bytes)Available download formats
    Dataset updated
    Dec 18, 2023
    Authors
    Bhavik Jikadara
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Google Gemini is an AI model that can work across text, images, video, audio, and code. It's designed to be multimodal, and is the first model to outperform human experts on Massive Multitask Language Understanding (MMLU).

  6. CIFAKE: Real and AI-Generated Synthetic Images

    • kaggle.com
    Updated Mar 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jordan J. Bird (2023). CIFAKE: Real and AI-Generated Synthetic Images [Dataset]. https://www.kaggle.com/datasets/birdy654/cifake-real-and-ai-generated-synthetic-images
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 28, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Jordan J. Bird
    Description

    CIFAKE: Real and AI-Generated Synthetic Images

    The quality of AI-generated images has rapidly increased, leading to concerns of authenticity and trustworthiness.

    CIFAKE is a dataset that contains 60,000 synthetically-generated images and 60,000 real images (collected from CIFAR-10). Can computer vision techniques be used to detect when an image is real or has been generated by AI?

    Further information on this dataset can be found here: Bird, J.J. and Lotfi, A., 2024. CIFAKE: Image Classification and Explainable Identification of AI-Generated Synthetic Images. IEEE Access.

    Dataset details

    The dataset contains two classes - REAL and FAKE.

    For REAL, we collected the images from Krizhevsky & Hinton's CIFAR-10 dataset

    For the FAKE images, we generated the equivalent of CIFAR-10 with Stable Diffusion version 1.4

    There are 100,000 images for training (50k per class) and 20,000 for testing (10k per class)

    Papers with Code

    The dataset and all studies using it are linked using Papers with Code https://paperswithcode.com/dataset/cifake-real-and-ai-generated-synthetic-images

    References

    If you use this dataset, you must cite the following sources

    Krizhevsky, A., & Hinton, G. (2009). Learning multiple layers of features from tiny images.

    Bird, J.J. and Lotfi, A., 2024. CIFAKE: Image Classification and Explainable Identification of AI-Generated Synthetic Images. IEEE Access.

    Real images are from Krizhevsky & Hinton (2009), fake images are from Bird & Lotfi (2024). The Bird & Lotfi study is available here.

    Notes

    The updates to the dataset on the 28th of March 2023 did not change anything; the file formats ".jpeg" were renamed ".jpg" and the root folder was uploaded to meet Kaggle's usability requirements.

    License

    This dataset is published under the same MIT license as CIFAR-10:

    Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

    The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

  7. GENERATIVE AI FOR SCIENTIFIC DOCUMENT INSIGHT

    • kaggle.com
    zip
    Updated Apr 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sourav Kumar Khawas (2025). GENERATIVE AI FOR SCIENTIFIC DOCUMENT INSIGHT [Dataset]. https://www.kaggle.com/datasets/souravkumarkhawas/generative-ai-for-scientific-document-insight
    Explore at:
    zip(49453 bytes)Available download formats
    Dataset updated
    Apr 5, 2025
    Authors
    Sourav Kumar Khawas
    Description

    Dataset

    This dataset was created by Sourav Kumar Khawas

    Released under Other (specified in description)

    Contents

  8. Generative AI Opinion Dataset on Twitter

    • kaggle.com
    zip
    Updated Jan 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pinjem Akun Falif (2025). Generative AI Opinion Dataset on Twitter [Dataset]. https://www.kaggle.com/datasets/msfalif404/generative-ai-opinion-dataset-on-twitter
    Explore at:
    zip(2405814 bytes)Available download formats
    Dataset updated
    Jan 7, 2025
    Authors
    Pinjem Akun Falif
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Description This dataset contains opinions about Generative AI collected from social media platform Twitter. The data was gathered from 2021 to 2024 using various relevant keywords, such as: - generative ai opinion - generative ai thought - generative ai controversy - generative ai impact - generative ai ethics - generative ai risks - generative ai vs human - ai generated misinformation - generative ai backlash - generative ai opportunities - generative ai policy - #AIFailure - ai helps human - future of generative ai

    The dataset aims to provide insights into public opinions about Generative AI, highlighting both its opportunities and challenges, and is expected to be valuable for research or analysis on public opinion, ethics, policies, and the impacts of Generative AI.

    Citation Falif, Muhammad Sya'bani. Generative AI Opinion Dataset on Twitter. 2024. Kaggle.

  9. Generative AI Tools & Platforms Landscape

    • kaggle.com
    zip
    Updated Sep 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eman Fatima (2025). Generative AI Tools & Platforms Landscape [Dataset]. https://www.kaggle.com/datasets/emanfatima2025/generative-ai-tools-and-platforms-landscape
    Explore at:
    zip(3501 bytes)Available download formats
    Dataset updated
    Sep 27, 2025
    Authors
    Eman Fatima
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Description of the dataset

    This dataset offers a carefully chosen summary of the 113 generative AI platforms and technologies that will be accessible in 2025. The firm, category, year of release, accessibility (API and open-source availability), and supported modalities (text, image, video, audio, code, design, productivity, safety, infrastructure, multimodal) are all covered in detail in each entry.

    Researchers, developers, and analysts can use the dataset to evaluate platforms based on their features and accessibility, follow trends across many categories, and gain a better understanding of the dynamic AI ecosystem.

    Important attributes:

    Tool Details:Name, business, domain, and website

    Modalities and Categories:Canonical flags for modality and category

    Accessibility:API availability and status, open-source status

    Timeline:Years since release, year of release

    Capabilities:Support for text, image, video, audio, code, design, productivity, infrastructure, safety, and multimodal

  10. Generative AI

    • kaggle.com
    zip
    Updated Jul 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohsen Mostafa (2024). Generative AI [Dataset]. https://www.kaggle.com/datasets/babydriver1233/generative-ai
    Explore at:
    zip(426063072 bytes)Available download formats
    Dataset updated
    Jul 19, 2024
    Authors
    Mohsen Mostafa
    Description

    With the exponential rise in the consumption of audio-visual content, rapid video content creation has become a quintessential need. At the same time, making these videos accessible in different languages is also a key challenge. For instance, a deep learning lecture series, a famous movie, or a public address to the nation, if translated to desired target languages, can become accessible to millions of new viewers. A crucial aspect of translating such talking face videos or creating new ones is correcting the lip sync to match the desired target speech. Consequently, lip-syncing talking face videos to match a given input audio stream has received considerable attention in the research community.

    This is usage of the Wav2Lip model, a state-of-the-art lip-syncing tool that can generate highly accurate and realistic lip-synced videos from arbitrary input audio and video sources.

  11. generative ai

    • kaggle.com
    zip
    Updated Oct 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joko Slamet (2023). generative ai [Dataset]. https://www.kaggle.com/datasets/jokoslamet99/generative-ai
    Explore at:
    zip(6370 bytes)Available download formats
    Dataset updated
    Oct 29, 2023
    Authors
    Joko Slamet
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Joko Slamet

    Released under Apache 2.0

    Contents

  12. Bitext Gen AI Chatbot Customer Support Dataset

    • kaggle.com
    zip
    Updated Mar 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bitext (2024). Bitext Gen AI Chatbot Customer Support Dataset [Dataset]. https://www.kaggle.com/datasets/bitext/bitext-gen-ai-chatbot-customer-support-dataset
    Explore at:
    zip(3007665 bytes)Available download formats
    Dataset updated
    Mar 18, 2024
    Authors
    Bitext
    License

    https://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/

    Description

    Bitext - Customer Service Tagged Training Dataset for LLM-based Virtual Assistants

    Overview

    This dataset can be used to train Large Language Models such as GPT, Llama2 and Falcon, both for Fine Tuning and Domain Adaptation.

    The dataset has the following specs:

    • Use Case: Intent Detection
    • Vertical: Customer Service
    • 27 intents assigned to 10 categories
    • 26872 question/answer pairs, around 1000 per intent
    • 30 entity/slot types
    • 12 different types of language generation tags

    The categories and intents have been selected from Bitext's collection of 20 vertical-specific datasets, covering the intents that are common across all 20 verticals. The verticals are:

    • Automotive, Retail Banking, Education, Events & Ticketing, Field Services, Healthcare, Hospitality, Insurance, Legal Services, Manufacturing, Media Streaming, Mortgages & Loans, Moving & Storage, Real Estate/Construction, Restaurant & Bar Chains, Retail/E-commerce, Telecommunications, Travel, Utilities, Wealth Management

    For a full list of verticals and its intents see https://www.bitext.com/chatbot-verticals/.

    The question/answer pairs have been generated using a hybrid methodology that uses natural texts as source text, NLP technology to extract seeds from these texts, and NLG technology to expand the seed texts. All steps in the process are curated by computational linguists.

    Dataset Token Count

    The dataset contains an extensive amount of text data across its 'instruction' and 'response' columns. After processing and tokenizing the dataset, we've identified a total of 3.57 million tokens. This rich set of tokens is essential for training advanced LLMs for AI Conversational, AI Generative, and Question and Answering (Q&A) models.

    Fields of the Dataset

    Each entry in the dataset contains the following fields:

    • flags: tags (explained below in the Language Generation Tags section)
    • instruction: a user request from the Customer Service domain
    • category: the high-level semantic category for the intent
    • intent: the intent corresponding to the user instruction
    • response: an example expected response from the virtual assistant

    Categories and Intents

    The categories and intents covered by the dataset are:

    • ACCOUNT: create_account, delete_account, edit_account, recover_password, registration_problems, switch_account
    • CANCELLATION_FEE: check_cancellation_fee
    • CONTACT: contact_customer_service, contact_human_agent
    • DELIVERY: delivery_options, delivery_period
    • FEEDBACK: complaint, review
    • INVOICE: check_invoice, get_invoice
    • ORDER: cancel_order, change_order, place_order, track_order
    • PAYMENT: check_payment_methods, payment_issue
    • REFUND: check_refund_policy, get_refund, track_refund
    • SHIPPING_ADDRESS: change_shipping_address, set_up_shipping_address
    • SUBSCRIPTION: newsletter_subscription

    Entities

    The entities covered by the dataset are:

    • {{Order Number}}, typically present in:
    • Intents: cancel_order, change_order, change_shipping_address, check_invoice, check_refund_policy, complaint, delivery_options, delivery_period, get_invoice, get_refund, place_order, track_order, track_refund
    • {{Invoice Number}}, typically present in:
      • Intents: check_invoice, get_invoice
    • {{Online Order Interaction}}, typically present in:
      • Intents: cancel_order, change_order, check_refund_policy, delivery_period, get_refund, review, track_order, track_refund
    • {{Online Payment Interaction}}, typically present in:
      • Intents: cancel_order, check_payment_methods
    • {{Online Navigation Step}}, typically present in:
      • Intents: complaint, delivery_options
    • {{Online Customer Support Channel}}, typically present in:
      • Intents: check_refund_policy, complaint, contact_human_agent, delete_account, delivery_options, edit_account, get_refund, payment_issue, registration_problems, switch_account
    • {{Profile}}, typically present in:
      • Intent: switch_account
    • {{Profile Type}}, typically present in:
      • Intent: switch_account
    • {{Settings}}, typically present in:
      • Intents: cancel_order, change_order, change_shipping_address, check_cancellation_fee, check_invoice, check_payment_methods, contact_human_agent, delete_account, delivery_options, edit_account, get_invoice, newsletter_subscription, payment_issue, place_order, recover_password, registration_problems, set_up_shipping_address, switch_account, track_order, track_refund
    • {{Online Company Portal Info}}, typically present in:
      • Intents: cancel_order, edit_account
    • {{Date}}, typically present in:
      • Intents: check_invoice, check_refund_policy, get_refund, track_order, track_refund
    • {{Date Range}}, typically present in:
      • Intents: check_cancellation_fee, check_invoice, get_invoice
    • {{Shipping Cut-off Time}}, typically present in:
      • Intent: delivery_options
    • {{Delivery City}}, typically present in:
      • Inten...
  13. AI vs. Human-Generated Images

    • kaggle.com
    zip
    Updated Jan 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alessandra Sala (2025). AI vs. Human-Generated Images [Dataset]. https://www.kaggle.com/datasets/alessandrasala79/ai-vs-human-generated-dataset
    Explore at:
    zip(10475389582 bytes)Available download formats
    Dataset updated
    Jan 22, 2025
    Authors
    Alessandra Sala
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Official dataset for the 2025 Women in AI Kaggle Competition: https://www.kaggle.com/competitions/detect-ai-vs-human-generated-images

    The dataset consists of authentic images sampled from the Shutterstock platform across various categories, including a balanced selection where one-third of the images feature humans. These authentic images are paired with their equivalents generated using state-of-the-art generative models. This structured pairing enables a direct comparison between real and AI-generated content, providing a robust foundation for developing and evaluating image authenticity detection systems.

  14. Generative_AI_Data

    • kaggle.com
    zip
    Updated Jun 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Harsh Bansal (2023). Generative_AI_Data [Dataset]. https://www.kaggle.com/datasets/harshbansal27/generative-ai-data
    Explore at:
    zip(59885608 bytes)Available download formats
    Dataset updated
    Jun 15, 2023
    Authors
    Harsh Bansal
    Description

    Dataset

    This dataset was created by Harsh Bansal

    Contents

  15. Generative AI Market 2025

    • kaggle.com
    zip
    Updated Sep 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sidraazam (2025). Generative AI Market 2025 [Dataset]. https://www.kaggle.com/datasets/sidraaazam/generative-ai-market-2025
    Explore at:
    zip(3501 bytes)Available download formats
    Dataset updated
    Sep 28, 2025
    Authors
    Sidraazam
    License

    ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
    License information was derived automatically

    Description

    Description

    A thorough review of contemporary AI tools and platforms that facilitate the creation of text, images, code, videos, and audio can be found in the Generative AI Tools – Platforms 2025 dataset. It contains information on the main use case, target audience, platform type, and category of each tool. In 2025, this dataset will be a structured resource for examining the quickly expanding field of generative AI technology.

    Content

    The collection includes organized data on generative AI platforms and tools that will be accessible in 2025.

    Context

    One of the most revolutionary technologies of the 2020s, generative AI has developed quickly and is now driving applications in the creation of text, images, videos, code, and audio. Businesses, researchers, and creators are using AI tools at a never-before-seen scale thanks to the emergence of platforms like ChatGPT, MidJourney, Claude, and GitHub Copilot.

    Acknowledgment

    We would like to thank the researchers, technology companies, and members of the worldwide AI community whose work has influenced the generative AI ecosystem. This dataset was assembled from publicly accessible data on top AI platforms, academic studies, business websites, and reliable tech journals.

  16. Generative AI Report

    • kaggle.com
    zip
    Updated Dec 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Olfat Sayed (2024). Generative AI Report [Dataset]. https://www.kaggle.com/datasets/olfatsyed/report/discussion
    Explore at:
    zip(23173809 bytes)Available download formats
    Dataset updated
    Dec 9, 2024
    Authors
    Olfat Sayed
    Description

    Dataset

    This dataset was created by Olfat Sayed

    Contents

  17. AI-Face-Dataset-3000_Images

    • kaggle.com
    zip
    Updated Aug 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muhammad Shavaiz (2024). AI-Face-Dataset-3000_Images [Dataset]. https://www.kaggle.com/datasets/shavaizbutt/ai-face-dataset-3000-images
    Explore at:
    zip(3972046713 bytes)Available download formats
    Dataset updated
    Aug 26, 2024
    Authors
    Muhammad Shavaiz
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset is a curated subset of 3000 images extracted from a larger collection of approximately 80,000 AI-generated faces. It features diverse, synthetic facial images created using advanced generative models, each with unique characteristics and expressions. Designed for focused testing and smaller-scale machine learning tasks, this subset offers a manageable sample size for experimentation with facial recognition and model validation. For broader applications and comprehensive studies, refer to the full dataset available at Original Dataset.

    To access images in the ai-face-dataset-3000-images directory on Kaggle, list the files using os.listdir('/kaggle/input/ai-face-dataset-3000-images'). You can then load and process an image using libraries like PIL with Image.open('/kaggle/input/ai-face-dataset-3000-images/your-image-file.jpg').

  18. ShutterStock Dataset for AI vs Human-Gen. Image

    • kaggle.com
    zip
    Updated Jun 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sachin Singh (2025). ShutterStock Dataset for AI vs Human-Gen. Image [Dataset]. https://www.kaggle.com/datasets/shreyasraghav/shutterstock-dataset-for-ai-vs-human-gen-image
    Explore at:
    zip(11617243112 bytes)Available download formats
    Dataset updated
    Jun 19, 2025
    Authors
    Sachin Singh
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    ShutterStock AI vs. Human-Generated Image Dataset

    This dataset is curated to facilitate research in distinguishing AI-generated images from human-created ones, leveraging ShutterStock data. As AI-generated imagery becomes more sophisticated, developing models that can classify and analyze such images is crucial for applications in content moderation, digital forensics, and media authenticity verification.

    Dataset Overview:

    • Total Images: 100,000
    • Training Data: 80,000 images (majority AI-generated)
    • Test Data: 20,000 images
    • Image Sources: A mix of AI-generated images and real photographs or illustrations created by human artists
    • Labeling: Each image is labeled as either AI-generated or human-created

    Potential Use Cases:

    • AI-Generated Image Detection: Train models to distinguish between AI and human-made images.
    • Deep Learning & Computer Vision Research: Develop and benchmark CNNs, transformers, and other architectures.
    • Generative Model Evaluation: Compare AI-generated images to real images for quality assessment.
    • Digital Forensics: Identify synthetic media for applications in fake image detection.
    • Ethical AI & Content Authenticity: Study the impact of AI-generated visuals in media and ensure transparency.

    Why This Dataset?

    With the rise of generative AI models like Stable Diffusion, DALL·E, and MidJourney, the ability to differentiate between synthetic and real images has become a crucial challenge. This dataset offers a structured way to train AI models on this task, making it a valuable resource for both academic research and practical applications.

    Explore the dataset and contribute to advancing AI-generated content detection!

    Step 1: Install and Authenticate Kaggle API

    If you haven't installed the Kaggle API, run:
    bash pip install kaggle Then, download your kaggle.json API key from Kaggle Account and move it to ~/.kaggle/ (Linux/Mac) or `C:\Users\YourUser.kaggle` (Windows).

    Step 2: Use wget

      wget --no-check-certificate --header "Authorization: Bearer $(cat ~/.kaggle/kaggle.json | jq -r .token)" "https://www.kaggle.com/datasets/shreyasraghav/shutterstock-dataset-for-ai-vs-human-gen-image" -O dataset.zip
    

    Step 3: Extract the Dataset

    Once downloaded, extract the dataset using:
    bash unzip dataset.zip -d dataset_folder

    Now your dataset is ready to use! 🚀

  19. Flickr-Face-HQ and GenAI Dataset (FF-GenAI)

    • kaggle.com
    Updated May 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    A_rgonaut (2025). Flickr-Face-HQ and GenAI Dataset (FF-GenAI) [Dataset]. https://www.kaggle.com/datasets/argonautex/flickr-face-hq-and-genai-dataset-ff-genai
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 29, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    A_rgonaut
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    The dataset consists of 100k JPG images (50k real and 50k- fake) at 224x224 resolution pre-processed and merged by the following links:

    This dataset is designed to support research at the intersection of computer vision and generative models. By combining high-quality real face images from the Flickr-Faces-HQ (FFHQ) dataset with AI-generated counterparts, this dataset provides a robust foundation for multiple advanced applications:

    GAN Training. With its high resolution and rich visual diversity, the dataset is ideal for training Generative Adversarial Networks (GANs), enabling models to learn realistic facial features across a wide range of demographics and conditions.

    Synthetic Content Detection. The inclusion of both real and generated images makes the dataset particularly suitable for developing and benchmarking algorithms aimed at detecting AI-generated content, a critical task in the age of deepfakes.

    Model Generalization Testing. The variety and complexity of the data offer a reliable benchmark for evaluating how well machine learning models generalize to unseen examples, contributing to the development of more robust and adaptable systems.

  20. Generative AI tools -2025

    • kaggle.com
    zip
    Updated Sep 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ayesha Imran (2025). Generative AI tools -2025 [Dataset]. https://www.kaggle.com/datasets/ayeshaimran123/generative-ai-tools-2025/discussion
    Explore at:
    zip(3501 bytes)Available download formats
    Dataset updated
    Sep 27, 2025
    Authors
    Ayesha Imran
    License

    ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
    License information was derived automatically

    Description

    Description:

    This dataset highlights the leading genrative AI platforms and tools of 2025, providing key details on the features, applications, and role in shaping future technologies.

    Content:

    This dataset includes detals on the main generative AI tools and platforms for 2025. Platform names, important fetures, aplications, providers, and usage area are among the details it contain.

    Context:

    Generative AI is shaping industres by automating creatvity, enhancing productvity, and providing new ways of problem solving. This dataset serve as a refrence for understanding the scope of available AI tools.

    Acknowledgment:

    The dataset is compiled from publicaly available information on generative AI platforms and tools. Credit goes to developrs, organizations, and community advancing AI research and applications.

    Provenance:

    The information is source from oficial documentation, tech reports, and truste AI publications. It is cerated for educational and research purposes.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Niyamat Ullah (2023). Generative AI Applications [Dataset]. https://www.kaggle.com/datasets/niyamatalmass/generative-ai-applications
Organization logo

Generative AI Applications

Explore at:
zip(82678 bytes)Available download formats
Dataset updated
Jul 16, 2023
Authors
Niyamat Ullah
Description

Dataset

This dataset was created by Niyamat Ullah

Contents

Search
Clear search
Close search
Google apps
Main menu