Facebook
TwitterThis dataset was created by Niyamat Ullah
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset offers a thorough summary of the platforms and tools for generative AI that will be accessible through 2025. The tool names, businesses, release years, categories, modalities (text, image, video, audio, code, multimodal), open-source status, and API availability are all covered in detail. Researchers, students, data analysts, and AI enthusiasts can use the dataset to better understand the quickly expanding field of artificial intelligence.
By using this dataset, you can:
This dataset is a useful tool for academic and professional settings since it may be used in data analytics, business insights, career research, and AI innovation studies.
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This dataset provides a collection of popular Generative AI tools that are shaping the future of creativity, productivity, and automation at high level. It includes details such as tool name ,company, category, modality_canonical, open source, api_available, api_status, release year, category, (e.g., text, image, video, music, code), and key features.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by SSRETRO
Released under MIT
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Google Gemini is an AI model that can work across text, images, video, audio, and code. It's designed to be multimodal, and is the first model to outperform human experts on Massive Multitask Language Understanding (MMLU).
Facebook
TwitterThe quality of AI-generated images has rapidly increased, leading to concerns of authenticity and trustworthiness.
CIFAKE is a dataset that contains 60,000 synthetically-generated images and 60,000 real images (collected from CIFAR-10). Can computer vision techniques be used to detect when an image is real or has been generated by AI?
Further information on this dataset can be found here: Bird, J.J. and Lotfi, A., 2024. CIFAKE: Image Classification and Explainable Identification of AI-Generated Synthetic Images. IEEE Access.
The dataset contains two classes - REAL and FAKE.
For REAL, we collected the images from Krizhevsky & Hinton's CIFAR-10 dataset
For the FAKE images, we generated the equivalent of CIFAR-10 with Stable Diffusion version 1.4
There are 100,000 images for training (50k per class) and 20,000 for testing (10k per class)
The dataset and all studies using it are linked using Papers with Code https://paperswithcode.com/dataset/cifake-real-and-ai-generated-synthetic-images
If you use this dataset, you must cite the following sources
Krizhevsky, A., & Hinton, G. (2009). Learning multiple layers of features from tiny images.
Bird, J.J. and Lotfi, A., 2024. CIFAKE: Image Classification and Explainable Identification of AI-Generated Synthetic Images. IEEE Access.
Real images are from Krizhevsky & Hinton (2009), fake images are from Bird & Lotfi (2024). The Bird & Lotfi study is available here.
The updates to the dataset on the 28th of March 2023 did not change anything; the file formats ".jpeg" were renamed ".jpg" and the root folder was uploaded to meet Kaggle's usability requirements.
This dataset is published under the same MIT license as CIFAR-10:
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Facebook
TwitterThis dataset was created by Sourav Kumar Khawas
Released under Other (specified in description)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset Description This dataset contains opinions about Generative AI collected from social media platform Twitter. The data was gathered from 2021 to 2024 using various relevant keywords, such as: - generative ai opinion - generative ai thought - generative ai controversy - generative ai impact - generative ai ethics - generative ai risks - generative ai vs human - ai generated misinformation - generative ai backlash - generative ai opportunities - generative ai policy - #AIFailure - ai helps human - future of generative ai
The dataset aims to provide insights into public opinions about Generative AI, highlighting both its opportunities and challenges, and is expected to be valuable for research or analysis on public opinion, ethics, policies, and the impacts of Generative AI.
Citation Falif, Muhammad Sya'bani. Generative AI Opinion Dataset on Twitter. 2024. Kaggle.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset offers a carefully chosen summary of the 113 generative AI platforms and technologies that will be accessible in 2025. The firm, category, year of release, accessibility (API and open-source availability), and supported modalities (text, image, video, audio, code, design, productivity, safety, infrastructure, multimodal) are all covered in detail in each entry.
Researchers, developers, and analysts can use the dataset to evaluate platforms based on their features and accessibility, follow trends across many categories, and gain a better understanding of the dynamic AI ecosystem.
Tool Details:Name, business, domain, and website
Modalities and Categories:Canonical flags for modality and category
Accessibility:API availability and status, open-source status
Timeline:Years since release, year of release
Capabilities:Support for text, image, video, audio, code, design, productivity, infrastructure, safety, and multimodal
Facebook
TwitterWith the exponential rise in the consumption of audio-visual content, rapid video content creation has become a quintessential need. At the same time, making these videos accessible in different languages is also a key challenge. For instance, a deep learning lecture series, a famous movie, or a public address to the nation, if translated to desired target languages, can become accessible to millions of new viewers. A crucial aspect of translating such talking face videos or creating new ones is correcting the lip sync to match the desired target speech. Consequently, lip-syncing talking face videos to match a given input audio stream has received considerable attention in the research community.
This is usage of the Wav2Lip model, a state-of-the-art lip-syncing tool that can generate highly accurate and realistic lip-synced videos from arbitrary input audio and video sources.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Joko Slamet
Released under Apache 2.0
Facebook
Twitterhttps://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/
This dataset can be used to train Large Language Models such as GPT, Llama2 and Falcon, both for Fine Tuning and Domain Adaptation.
The dataset has the following specs:
The categories and intents have been selected from Bitext's collection of 20 vertical-specific datasets, covering the intents that are common across all 20 verticals. The verticals are:
For a full list of verticals and its intents see https://www.bitext.com/chatbot-verticals/.
The question/answer pairs have been generated using a hybrid methodology that uses natural texts as source text, NLP technology to extract seeds from these texts, and NLG technology to expand the seed texts. All steps in the process are curated by computational linguists.
The dataset contains an extensive amount of text data across its 'instruction' and 'response' columns. After processing and tokenizing the dataset, we've identified a total of 3.57 million tokens. This rich set of tokens is essential for training advanced LLMs for AI Conversational, AI Generative, and Question and Answering (Q&A) models.
Each entry in the dataset contains the following fields:
The categories and intents covered by the dataset are:
The entities covered by the dataset are:
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Official dataset for the 2025 Women in AI Kaggle Competition: https://www.kaggle.com/competitions/detect-ai-vs-human-generated-images
The dataset consists of authentic images sampled from the Shutterstock platform across various categories, including a balanced selection where one-third of the images feature humans. These authentic images are paired with their equivalents generated using state-of-the-art generative models. This structured pairing enables a direct comparison between real and AI-generated content, providing a robust foundation for developing and evaluating image authenticity detection systems.
Facebook
TwitterThis dataset was created by Harsh Bansal
Facebook
TwitterODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
Description
A thorough review of contemporary AI tools and platforms that facilitate the creation of text, images, code, videos, and audio can be found in the Generative AI Tools – Platforms 2025 dataset. It contains information on the main use case, target audience, platform type, and category of each tool. In 2025, this dataset will be a structured resource for examining the quickly expanding field of generative AI technology.
Content
The collection includes organized data on generative AI platforms and tools that will be accessible in 2025.
Context
One of the most revolutionary technologies of the 2020s, generative AI has developed quickly and is now driving applications in the creation of text, images, videos, code, and audio. Businesses, researchers, and creators are using AI tools at a never-before-seen scale thanks to the emergence of platforms like ChatGPT, MidJourney, Claude, and GitHub Copilot.
Acknowledgment
We would like to thank the researchers, technology companies, and members of the worldwide AI community whose work has influenced the generative AI ecosystem. This dataset was assembled from publicly accessible data on top AI platforms, academic studies, business websites, and reliable tech journals.
Facebook
TwitterThis dataset was created by Olfat Sayed
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset is a curated subset of 3000 images extracted from a larger collection of approximately 80,000 AI-generated faces. It features diverse, synthetic facial images created using advanced generative models, each with unique characteristics and expressions. Designed for focused testing and smaller-scale machine learning tasks, this subset offers a manageable sample size for experimentation with facial recognition and model validation. For broader applications and comprehensive studies, refer to the full dataset available at Original Dataset.
To access images in the ai-face-dataset-3000-images directory on Kaggle, list the files using os.listdir('/kaggle/input/ai-face-dataset-3000-images'). You can then load and process an image using libraries like PIL with Image.open('/kaggle/input/ai-face-dataset-3000-images/your-image-file.jpg').
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
ShutterStock AI vs. Human-Generated Image Dataset
This dataset is curated to facilitate research in distinguishing AI-generated images from human-created ones, leveraging ShutterStock data. As AI-generated imagery becomes more sophisticated, developing models that can classify and analyze such images is crucial for applications in content moderation, digital forensics, and media authenticity verification.
With the rise of generative AI models like Stable Diffusion, DALL·E, and MidJourney, the ability to differentiate between synthetic and real images has become a crucial challenge. This dataset offers a structured way to train AI models on this task, making it a valuable resource for both academic research and practical applications.
Explore the dataset and contribute to advancing AI-generated content detection!
If you haven't installed the Kaggle API, run:
bash
pip install kaggle
Then, download your kaggle.json API key from Kaggle Account and move it to ~/.kaggle/ (Linux/Mac) or `C:\Users\YourUser.kaggle` (Windows).
wget --no-check-certificate --header "Authorization: Bearer $(cat ~/.kaggle/kaggle.json | jq -r .token)" "https://www.kaggle.com/datasets/shreyasraghav/shutterstock-dataset-for-ai-vs-human-gen-image" -O dataset.zip
Once downloaded, extract the dataset using:
bash
unzip dataset.zip -d dataset_folder
Now your dataset is ready to use! 🚀
Facebook
TwitterAttribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
The dataset consists of 100k JPG images (50k real and 50k- fake) at 224x224 resolution pre-processed and merged by the following links:
This dataset is designed to support research at the intersection of computer vision and generative models. By combining high-quality real face images from the Flickr-Faces-HQ (FFHQ) dataset with AI-generated counterparts, this dataset provides a robust foundation for multiple advanced applications:
GAN Training. With its high resolution and rich visual diversity, the dataset is ideal for training Generative Adversarial Networks (GANs), enabling models to learn realistic facial features across a wide range of demographics and conditions.
Synthetic Content Detection. The inclusion of both real and generated images makes the dataset particularly suitable for developing and benchmarking algorithms aimed at detecting AI-generated content, a critical task in the age of deepfakes.
Model Generalization Testing. The variety and complexity of the data offer a reliable benchmark for evaluating how well machine learning models generalize to unseen examples, contributing to the development of more robust and adaptable systems.
Facebook
TwitterODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
This dataset highlights the leading genrative AI platforms and tools of 2025, providing key details on the features, applications, and role in shaping future technologies.
This dataset includes detals on the main generative AI tools and platforms for 2025. Platform names, important fetures, aplications, providers, and usage area are among the details it contain.
Generative AI is shaping industres by automating creatvity, enhancing productvity, and providing new ways of problem solving. This dataset serve as a refrence for understanding the scope of available AI tools.
The dataset is compiled from publicaly available information on generative AI platforms and tools. Credit goes to developrs, organizations, and community advancing AI research and applications.
The information is source from oficial documentation, tech reports, and truste AI publications. It is cerated for educational and research purposes.
Facebook
TwitterThis dataset was created by Niyamat Ullah