Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains a total of 16737 unique animes. The reason for creating this dataset is the requirement of a clean dataset of Anime. I found a few datasets on anime, most of the datasets had the major anime but some dataset 1) doesn't have 'Genre' or 'Synopsis' of anime. For content-based recommendation, it is helpful if we have more information about anime 2) have duplicate data 3) missing data is represented by different notations.
Anime_id :anime Id (as per myanimelist.net)
Title : name of anime
Genre :Main genre
Synopsis :Brief Discription
Type
Producer
Studio
Rating :Rating of anime as pe myanimelist.net/
ScoredBy : Total no user scored given anime
Popularity :Rank of anime based on popularity
Members :No of members added given anime on their list
Episodes : No. of episodes
Source
Aired
Link
This dataset is a combination of 2 datasets
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is designed for building and testing anime recommendation systems. It contains 2,000 anime titles and 250 virtual users with generated preference data, making it ideal for both content-based and collaborative filtering approaches.
The main metadata file containing information for each anime.
| Column | Description |
|---|---|
| uid | Unique anime ID |
| title | Anime title |
| link | MyAnimeList link |
| synopsis | Short summary or plot description |
| score | Average community rating |
| ranked | Ranking based on score |
| popularity | Popularity index |
| members | Number of users who interacted with the anime |
| episodes | Total number of episodes |
| genre | Comma-separated genres for the anime |
This file is more convenient for training machine learning models.
| Column | Description |
|---|---|
| uid | Unique anime ID |
| title | Anime title |
| score | Average community rating |
| ranked | Ranking based on score |
| popularity | Popularity index |
| genres | Comma-separated genres for the anime |
The dataset covers 76 genres, offering rich diversity for modeling. Genres include but are not limited to:
Action, Adventure, Comedy, Drama, Fantasy, Mecha, Romance, Sci-Fi, Slice of Life, Supernatural, and many more.
(Full list of 76 genres is included in the dataset metadata.)
This file represents synthetic user profiles, each describing their preference intensity for each genre. Useful for modeling user embeddings or computing user–genre similarity.
| Column | Description |
|---|---|
| user_id | Unique user ID (1–250) |
| # Genre Columns | Each genre’s preference score (0–10 scale) |
The user–anime rating matrix used for collaborative filtering or hybrid recommendation.
| Column | Description |
|---|---|
| user_id | Reference to user in list_of_users.csv |
| anime_id | Reference to anime in anime_genre_binary.csv |
| score | Rating given by the user (0–10 scale) |
-Content-based or hybrid anime recommendation systems -Clustering or similarity analysis based on genre and synopsis -NLP tasks such as synopsis embedding or sentiment classification -Exploratory Data Analysis (EDA) and visualization
-All data are cleaned and preprocessed for ease of use. -Missing values have been handled appropriately. -The one-hot encoded version is optimized for ML pipelines.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Hello Everyone This dataset consist of detailed information about Animes. The dataset consist of different columns namely: - Name - Type - Rating - Rank - Description - Tags - NTags This dataset can be used for various purposes including Recommendation Engines and Search Engines. I have used the data for making a Search Engine that can be accessed here https://flask-production-16d0.up.railway.app/
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset consists of user ratings for anime titles. Each user in the dataset has provided at least 5 ratings, ensuring a minimum level of engagement. The dataset includes user anime ratings and detailed information about anime, making it suitable for tasks such as recommendation systems and genre-based filtering. Dataset is freshly-created so it cover newer animes. Data is provided in the MovieLens format except timestamp column. With minor modifications, the dataset can be used in any recommendation project that utilizes the MovieLens dataset. I was able to train BERT model in https://github.com/jaywonchung/BERT4Rec-VAE-Pytorch project with some small modifications.Some StatisticsNumber of Users: 1,774,522Number of Animes: 20,237Total Ratings: 148,170,496BERT Anime Recommender GitHub repo : https://github.com/MRamazan/AnimeRecBERTDataset GitHub repo: https://github.com/MRamazan/User-Animelist-DatasetWeb demo: https://www.animerecbert.online (may be down)
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset offers a comprehensive overview of the top animes of 2024, and is useful for building recommendation systems, visualizing trends in anime popularity and score, predicting scores and popularity, and such.
The dataset contains 22 features:
All of the information in this dataset has been gathered by scraping the MyAnimeList website, and is available under the Creative Commons License.
Cover Photo by: Playground.ai
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains three datasets for evaluating accuracy, miscalibration and popularity lift in recommender systems. All datasets contain genre/category information in addition to different user group splits:
Last.fm (lfm.zip), based on the LFM-1b dataset of JKU Linz (http://www.cp.jku.at/datasets/LFM-1b/)
MovieLens (ml.zip), based on MovieLens-1M dataset (https://grouplens.org/datasets/movielens/1m/)
MyAnimeList (anime.zip), based on the MyAnimeList dataset of Kaggle (https://www.kaggle.com/CooperUnion/anime-recommendations-database)
'user_events_cats.txt' contains the users' rating/interaction data along with a list of genres/categories assigend to the rated items. The list of categories is given in 'categories.txt'. Additionally, assignments to three user groups that differ in their inclination to popular/mainstream items are provided: LowPop in 'low_main_users.txt', MedPop in 'med_main_users.txt', and HighPop in 'high_main_users.txt'.
The format of the three user files are "user,mainstreaminess"
The format of the user-events files are "user,item,preference,cats", where different categories are separated by '|'
The format of the categories files are "category-name,index", where index refers to the category-id in the user-events files
Example Python-code for analyzing the datasets as well as empirical results on calibration, popularity lift and accuracy can be found on GitHub: https://github.com/domkowald/FairRecSys
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Our dataset comprises comprehensive user preference data gathered from 73,516 avid anime enthusiasts, spanning across 12,294 diverse anime titles. Each individual user has the autonomy to curate their own completed anime list, supplemented with personal ratings reflecting their viewing experience. This rich compilation of user-generated ratings forms the backbone of our dataset, offering invaluable insights into the nuanced preferences and tastes of anime enthusiasts worldwide.
Explore the preferences and behaviors of over 73,000 users, each with their unique anime consumption habits and rating patterns.
Dive into a vast collection of 12,000+ anime titles, ranging from timeless classics to contemporary releases across various genres and themes.
Gain access to users' completed anime lists, providing a glimpse into the breadth and depth of their viewing history.
Uncover users' subjective evaluations of anime titles, quantified through personalized ratings, offering a granular understanding of viewer satisfaction and engagement.
anime_id - unique id identifying an anime. name - full name of anime. genre - comma separated list of genres for this anime. type - movie, TV, OVA, etc. episodes - how many episodes in this show. (1 if movie). rating - average rating out of 10 for this anime. members - number of community members that are in this anime's "group".
user_id - non identifiable randomly generated user id. anime_id - the anime that this user has rated. rating - rating out of 10 this user has assigned (-1 if the user watched it but didn't assign a rating).
Facebook
Twitterhttps://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction • The Top Popular Anime Dataset is a large, open-source dataset containing information about more than 22,000 animated series and movies collected through MyAnimeList's Jikan API.
2) Data Utilization (1) Top Popular Anime Dataset has characteristics that: • This dataset includes unique identifiers, English/Japanese titles, genres, types (TVs, movies, etc.), number of episodes, airing status, start/end date, running time per episode, user rating, rating, age rating, production company, producer, image and trailer URL, synopsis, etc. • For some animations, some values may be missing, such as English titles, ratings, trailers, and end-of-air dates. (2) Top Popular Anime Dataset can be used to: • Development of a recommendation system: It can utilize a variety of information such as user ratings, genres, synopsis, etc. to build a personalized animation recommendation system. • Trend and Genre Analysis: By analyzing popularity and rating changes by time series, genre, and production company, trends and success factors in the animation industry can be derived.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Four multimedia recommender systems datasets to study popularity bias and fairness:
Last.fm (lfm.zip), based on the LFM-1b dataset of JKU Linz (http://www.cp.jku.at/datasets/LFM-1b/)
MovieLens (ml.zip), based on MovieLens-1M dataset (https://grouplens.org/datasets/movielens/1m/)
BookCrossing (book.zip), based on the BookCrossing dataset of Uni Freiburg (http://www2.informatik.uni-freiburg.de/~cziegler/BX/)
MyAnimeList (anime.zip), based on the MyAnimeList dataset of Kaggle (https://www.kaggle.com/CooperUnion/anime-recommendations-database)
Each dataset contains of user interactions (user_events.txt) and three user groups that differ in their inclination to popular/mainstream items: LowPop (low_main_users.txt), MedPop (med_main_users.txt), and HighPop (high_main_users.txt).
The format of the three user files are "user,mainstreaminess"
The format of the user-events files are "user,item,preference"
Example Python-code for analyzing the datasets as well as more information on the user groups can be found on Github (https://github.com/domkowald/FairRecSys) and on Arxiv (https://arxiv.org/abs/2203.00376)
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
📚 Dataset Summary
This dataset features 1,941 anime character images, neatly organized into 322 folders, each representing a different anime series 🎌.
📦 Size of downloaded files: 152 MB 🪄 Size of auto-converted Parquet files: 151 MB 📊 Split: Train only 🎭 Classes: 322 unique anime titles
Perfect for image classification, anime recommendation systems, and visual style analysis! 🎨✨
🏆 Supported Tasks
🖼️ Image Classification: Predict the anime title based on a… See the full description on the dataset page: https://huggingface.co/datasets/adi2606/Anime_Characters.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Anime Dataset
Dataset Description
This dataset contains comprehensive information about anime series scraped from MyAnimeList (MAL). It includes detailed metadata about 871 (approx) anime series, making it valuable for various NLP tasks, recommendation systems, and cultural analysis. This dataset has NSFW content.
Dataset Summary
Anime Entries: 50 anime series with rich metadata Languages: English and Japanese (titles, descriptions) Format: JSONL (JSON Lines)… See the full description on the dataset page: https://huggingface.co/datasets/realoperator42/anime-titles-dataset.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Waifus and husbando dataset
This dataset contains information about 132.028 characters and the preference from 72.629 different users of characters scrapped from anime-planet. In particular, this dataset contain:
The anime data was scrapped between June 29th and August 14th.
The "html" folder contain 1 html per character (132.028 different characters). I uploaded 2 files as example to don't increase the size of this dataset. All HTML files are in this link: https://drive.google.com/drive/folders/1Kg0OZ6dEsQuJZVqj1CcTGwDnwp4sNOnW?usp=sharing
user_characters.csv have the list of all character register by the user with the respective love boolean (means if the user love or hate the character). This dataset contains 12 Million row, 72.629 different animes and 132.028 different characters. The file have the following columns:
characters_metadata.csv contain general information of every character (132.028 different character) like Tags, alias, name, gender, etc. This file have the following columns:Thanks to: 1. Anime Planet for providing anime data.
Experiment with different types of recommended. For instance, collaborative filtering or based on context like Tags, description, etc.
Use this information to build a character recommended system.
Build another dataset with anime topic.
Try to Improve Anime Recommendation Database 2020 with more data of characters from the anime. This need to extract the anime id from every html.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains over 50,000 anime entries with key information useful for building recommendation systems.
Data was collected via the AniList GraphQL API, cleaned, and formatted into a CSV file. This dataset is suitable for NLP tasks, recommendation engines, and data visualization projects.
Inspired by the growing interest in anime recommendation models and the lack of comprehensive, high-quality datasets.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
📺 Anime Watchers Dataset (1960–2025)
📦 Dataset Size: 10,000 Records📄 Format: CSV📜 License: CC-BY-4.0
📖 Overview
This dataset contains 10,000 synthetic profiles of anime watchers, spanning from the Classic Era (1960s–1989) to the Modern Era (2010–2025).
It is designed for:
Data Analysis Machine Learning Recommendation Systems Trend Prediction in anime consumption.
📂 Features
Each record represents an individual anime watcher with detailed… See the full description on the dataset page: https://huggingface.co/datasets/Mikey-TraceGod/Anime-Viewers-Data.
Facebook
TwitterThis dataset was created by Dimple Bathija
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
🎌 Ultimate Anime Dataset (8,248 Entries) | 1917-2025
A meticulously curated collection spanning 108 years of anime history
Love this dataset and the Anime Receipts concept? You can download the complete project via the links below:
🚀 Unlock the Full Potential
Product What You Get Get It Here
Tier 1 8,248 Anime Dataset (Parquet)
Tier 2 Full AiMi Recommendation System (Backend + UI)
Tier 3 Ultimate AiMi Recommendation System + AiMi Anime… See the full description on the dataset page: https://huggingface.co/datasets/DivyanshuSingh96/aimi-anime-rag-dataset-sample.
Facebook
Twitterhttps://cdla.io/permissive-1-0/https://cdla.io/permissive-1-0/
MyAnimeList is a popular online platform that allows users to create a list of anime and manga they have watched or read, rate them, and write reviews. The MyAnimeList dataset available on Kaggle is a collection of information about various anime titles and their corresponding attributes, such as title, genre, rating, popularity, and episode count. The columns include information about the anime title, the type of anime (TV show, movie, OVA, etc.), the genre(s) it belongs to, the studio that produced it, the source material (whether it is an original work or an adaptation), and the season and year of release.
In addition to the basic information, the dataset also includes ratings and popularity metrics, such as the number of users who have rated the anime and the average rating score, as well as the number of members who have added the anime to their list and the number of favorites. Moreover, the dataset includes information about the anime's episodes, duration, and opening and ending themes.
This dataset could be useful for various applications, such as building recommendation systems, conducting research on anime trends, and analyzing the relationship between various attributes (e.g., genre and popularity). Overall, the MyAnimeList dataset is an invaluable resource for anyone interested in anime and manga, and it provides a wealth of information that can be leveraged for various data-driven analyses.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset is a comprehensive collection of anime data, fetched from the MyAnimeList website using the Jikan API. It includes information on the latest animes, making it a valuable resource for up-to-date recommendations.
The dataset is primed for building an anime recommendation system. But you can also perform Exploratory data analysis , Data Cleaning .
Overall, this dataset provides a rich source of information for anime enthusiasts and data scientists alike, offering a solid foundation for developing sophisticated recommendation systems and conducting insightful data analysis. It stands as a testament to the power of data in enhancing user experiences and driving innovation in the entertainment industry.
Facebook
TwitterThis data set contains information on user preference data from 108,024 users on 15,221 animes. Each user is able to add anime to their completed list and give it a rating and this data set is a compilation of those ratings. This data set includes animes up to 2020 winter. 108,024 users are targeted at any anime fan around the world between the ages of 14 to 34.
Anime.csv - anime_id - myanimelist.net's unique id identifying an anime. - title - full title name of anime. - genres - comma separated list of genres for this anime. - media - movie, TV, OVA, etc. - episodes - how many episodes in this show. (1 if movie or ova). - rating - average rating out of 10 for this anime. - members - number of community members that are in this anime's "group". - start_date - when this anime started. - season - what season this anime started. - source - manga, light_novel, original, etc.
Rating.csv - user_id - non identifiable randomly generated user id. - anime_id - the anime that this user has rated. - rating - rating out of 10 this user has assigned (0 if the user watched it but didn't assign a rating).
Thanks to myanimelist.net API for providing anime data and user ratings, and thanks to CooperUnion(https://www.kaggle.com/CooperUnion/anime-recommendations-database)
Building a better anime recommendation system based only on user viewing history.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Based on : https://www.kaggle.com/datasets/tavuksuzdurum/user-animelist-dataset
Cleaned_animelist: Consists of Anime ID, titles, type of content, year of release, score (average ratings from all the users), amount of episodes, the MyAnimeList URL, and sequel.
ratings_df: unique UserID giving Ratings (/10) to each unique AnimeID, also includes the embeddings for users and animes.
This dataset includes fresh and newer animes available, usable for data analysis and identifying trends.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains a total of 16737 unique animes. The reason for creating this dataset is the requirement of a clean dataset of Anime. I found a few datasets on anime, most of the datasets had the major anime but some dataset 1) doesn't have 'Genre' or 'Synopsis' of anime. For content-based recommendation, it is helpful if we have more information about anime 2) have duplicate data 3) missing data is represented by different notations.
Anime_id :anime Id (as per myanimelist.net)
Title : name of anime
Genre :Main genre
Synopsis :Brief Discription
Type
Producer
Studio
Rating :Rating of anime as pe myanimelist.net/
ScoredBy : Total no user scored given anime
Popularity :Rank of anime based on popularity
Members :No of members added given anime on their list
Episodes : No. of episodes
Source
Aired
Link
This dataset is a combination of 2 datasets