26 datasets found

Anime Quest Dataset
kaggle.com
Updated Jun 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Md Yasmi Tohabar Evon (2023). Anime Quest Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/6045074
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/6045074
Dataset updated
Jun 28, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Md Yasmi Tohabar Evon
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

This dataset contains information about Anime scraped from Anime Planet on 28/06/2023. It contains information about anime (episodes, aired date, rating, genre, etc.), and favorite anime based on the countries and top countries that watch the most anime.

The scraped program of this dataset is in Anime.Quest GitHub repository.

Tableau visualization of this dataset can also be found in Anime Quest: Visualization.

Content

The dataset contains 3 files:

📁 anime_data.csv: 1. Name: Full name of the anime 2. Media Type: TV, Web, Movie, etc. 3. Episodes: Total episodes of the anime 4. Studio: Name of the studios of the anime, from most recent to oldest. 5. Start Year: Release Year of the anime 6. End Year: Last year of the anime airing 7. Ongoing: Is the anime currently airing or not? True or False. 8. Release Season: Spring, Fall, Winter, and Summer 9. Rating: The global rating ranges from 0 to 5. 10. Rank: Global ranking of the anime 11. Members: Total members of the anime 12. Genre: The category of the anime 13. Creator: Creator of the anime

📁 anime_top_by_country_data.csv: 1. Country: Individual country name 2. Most Popular: The most popular anime in the country 3. 2nd Place: Second-most popular anime in the country 4. 3rd Place: Third-most popular anime in the country 5. 4th Place: Fourth-most popular anime in the country 6. 5th Place: The fifth-most popular anime in the country

📁 anime_watching_data.csv: 1. Rank: Ranking of countries based on the number of anime viewers 2. Country: Individual country name 3. Population: Total population of the country 4. Percentage of People Watching: Percentage of people watching anime in the country 5. Number of People Watching: Total number of people watching anime in the country

Acknowledgements

The website Anime Planet was used to scrape this dataset. Please include citations for this dataset if you use it in your own research.

Inspiration

This dataset can be used to find the factors determining an anime's rating and ranking. Additionally, it can be used to make anime recommendations. The pattern can be observed in anime.
Anime Dataset
kaggle.com
Updated Jul 22, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tarun R Jain (2022). Anime Dataset [Dataset]. https://www.kaggle.com/datasets/tarundalal/anime-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 22, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Tarun R Jain
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Anime Vyuh: A World Of Anime Dataset

Explore Anime Dataset that consists of Anime data for the last 10 years including the Summer, Winter, and Spring seasons. Columns information: - Anime = Includes Anime Title - Genre = Includes Anime Genre - Description = Synopsis of Anime - Studio = The Animation Studio - Year = Release Year along with date and month - Rating = In terms of stars.

GitHub Repository contains how the data was created along with EDA and Recommendation System Machine Learning. Do Star it and open a pull request for any suggestions: https://github.com/lucifertrj/AnimeWorldDataset_HUB

Join our Community

"https://discord.com/invite/kxZYxdTKp6"> https://discord.com/api/guilds/939520548726272010/widget.png?style=banner1">

Credits: MyAnimeList is the website from where I scraped the Dataset.
Top 10000 Anime Movies ,OVA's and Tv-Shows
kaggle.com
zip
Updated Apr 11, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Thomas Konstantin (2021). Top 10000 Anime Movies ,OVA's and Tv-Shows [Dataset]. https://www.kaggle.com/thomaskonstantin/top-10000-anime-movies-ovas-and-tvshows
Explore at:
zip(2226960 bytes)Available download formats
Dataset updated
Apr 11, 2021
Authors
Thomas Konstantin
Description
Context

Anime, style of animation popular in Japanese films. Early anime films were intended primarily for the Japanese market and, as such, employed many cultural references unique to Japan. For example, the large eyes of anime characters are commonly perceived in Japan as multifaceted “windows to the soul.” Much of the genre is aimed at children, but anime films are sometimes marked by adult themes and subject matter. Modern anime began in 1956 and found lasting success in 1961 with the establishment of Mushi Productions by Osamu Tezuka, a leading figure in modern manga, the dense, novelistic Japanese comic book style that contributed greatly to the aesthetic of anime. - "britannica.com"

Content

The dataset contains information regarding the 10000 most common and known anime series, OVA's and movies. For each entry, the rating, synopsis or description, air dates, and type are recorded.

Acknowledgements

All rights and credit for the information are reserved to https://myanimelist.net/ that provide the amazingly detailed information about the animes in this list, I do not own in any way rights for commercial use of this data.

Inspiration

Seeing the climbing trend of more and more people starting to watch and enjoy anime, like Netflix, for example, keeps adding more and more anime movies and show to its platform, a natural question to ask is what makes a good anime series? Is there any pattern that makes certain animes better than others? How different are the plots?
j
Data from: Self-relevance of anime, sociability, and individual and...
jstagedata.jst.go.jp
txt
Updated Jul 27, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ryohei Kitazawa; Akinori Ono (2023). Self-relevance of anime, sociability, and individual and collective ownerships [Dataset]. http://doi.org/10.50998/data.marketing.23531094.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.50998/data.marketing.23531094.v1
Dataset updated
Jul 27, 2023
Dataset provided by
Japan Marketing Academy
Authors
Ryohei Kitazawa; Akinori Ono
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
These data contain (1) type of manipulated anime consumers (fans/manias/otaku), (2) two variables for the manipulation check, namely (2a) anime relevance to the self, measured by six items, “That anime and I have a lot in common,” “That anime is central to my identity,” “That anime is part of who I am,” “I derived some of my identity from that anime,” “That anime helps me to achieve the identity I wished to have,” “That anime helps me to narrow the gap between what I am and what I try to be” (0-100) and (2b) sociability, measured by five items, “I like to be with people,” “I welcome the opportunity to mix socially with people,” “I prefer working with others rather than alone,” “I find people more stimulating than anything else,” “I'd be unhappy if I were prevented from making many social contacts” (1-7), (3) consumer individual psychological ownership of anime content, measured by four items, “This is MY anime,” “I sense that this anime is MINE,” “I feel a very high degree of personal ownership for this anime,” “When I watch this anime it feels as though I own it” (1-7), and (4) consumer collective psychological ownership of the anime content, which is measured by three items, “Other consumers and I collectively sense that this anime is OURS,” “Other consumers and I collectively feel a very high degree of shared ownership for this anime,” “Most consumers that watch this anime feel as though they own the anime” (1-7).
MyAnimeList
kaggle.com
zip
Updated Sep 15, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Quang-Vinh Do (2020). MyAnimeList [Dataset]. https://www.kaggle.com/datasets/qvinhdo/myanimelist
Explore at:
zip(943068300 bytes)Available download formats
Dataset updated
Sep 15, 2020
Authors
Quang-Vinh Do
Description
Context

This dataset contains a collection of animes, users, and ratings scraped from MyAnimeList.net using their official API along with the Jikan API. A more detailed dataset of animelist ratings can be found at https://www.kaggle.com/azathoth42/myanimelist, however we just rescraped the animelists again to get more updated information. We did however use the list of user_ids from https://www.kaggle.com/azathoth42/myanimelist to start scraping animelists. This dataset was last updated on September 14, 2020.

Content

This dataset contains 4 files. - animes.csv - 17,058 animes containing information on title, anime_id, airing status, number of episodes, and synopsis. - users.csv - 302,674 users containing simple user information like username, gender, location, birthdate and join date. - user_watches.csv - 68,235,827 user animelist ratings with the score and watch status. - mal_db.dump - Dump file of postgresql database containing all 3 csv information above along with proper PK/FK, other constraints, and indexes.

Note the meanings of certain values For the status for animes: - 1: Currently Airing - 2: Finished Airing - 3: Not yet aired

For the of status for user_watches ratings: - 1: watching - 2: completed - 3: on hold - 4: dropped - 6: plan to watch

Acknowledgements

https://www.kaggle.com/azathoth42/myanimelist for UserList.csv
MyAnimelist Jikan Database
kaggle.com
zip
Updated Jul 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andreu Vall Hernàndez (2022). MyAnimelist Jikan Database [Dataset]. https://www.kaggle.com/datasets/andreuvallhernndez/myanimelist-jikan
Explore at:
zip(43830799 bytes)Available download formats
Dataset updated
Jul 23, 2022
Authors
Andreu Vall Hernàndez
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Jikan is a PHP & REST API for MyAnimeList. It has two main parts: a PHP library, which has several methods to scrape and parse a lot of data from MyAnimeList's desktop version; and a REST API, which uses the previous PHP library and provides a public API to obtain certain data from MyAnimeList (in json format).

To avoid overloading MyAnimeList, the REST API uses an internal MongoDB Database to store and cache previously scraped data. Some entries are updated automatically once/day, others only when asked from the REST API. This dataset consists of the scraping of the 4 main collections from the REST API cached database: Animes, Characters, Mangas and People.

The scraping was done on 17 July 2022 and it took slightly less than 3 hours 30 minutes. The scraping process is really really simple and is uploaded in GitHub.

It contains the information of: - 24 640 Animes - 146 049 Characters - 66 371 Mangas - 16 943 People

The cleaning process is a bit longer and it's also explained in the GitHub. Basically it consists in simplifying dictionary columns, adjusting some old values and adding two new columns (nsfw and pending_approval).

In the near future I'll post a more complete Dataset relating the Characters & Staff with Anime and Manga and the Relations between Animes and Mangas, and I'll be updating that weekly, but that version will have a lot more complicated code and take a lot longer to scrape (over 1 day, and I will scrape too the MyAnimeList official API to know which Animes / Mangas have been updated to update only modified entries), so this is the preliminary and beautifully simple Jikan only version.

Thanks a lot to Jikan API, studying their API architecture was quite fun, and the scraped data from MyAnimeList is awesome.
MyAnimeList Anime & Manga Dataset (July 2025)
kaggle.com
zip
Updated Aug 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hamza Ashfaq (2025). MyAnimeList Anime & Manga Dataset (July 2025) [Dataset]. https://www.kaggle.com/datasets/hamzaashfaque1999/myanimelist-scraped-data
Explore at:
zip(27845136 bytes)Available download formats
Dataset updated
Aug 27, 2025
Authors
Hamza Ashfaq
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Description

Scraped dataset of an anime and manga database known as MyAnimeList.net. Contains two .csv files with 105,983 scraped entries total consisting of 28,635 anime entries and 77,348 manga entries

Schema for anime_entries.csv

id: Unique identifier assigned by the website.
link: URL for the entry.
title_name: Title of the entry.
score: Weighted average of scores given by users.
scored_by: Number of people who scored the entry.
ranked: Ranking of an entry.
popularity: Rank of popularity compared to other shows with 1st being highest, decided by number of members.
members: Number of users that have interacted with the entry.
favorited: Number of users that have favorited the show.
synonymns: Other titles by which the show is referred.
japanese_name: Name in Japanese.
english_name: Name in English.
german_name: Name in German.
french_name: Name in French.
spanish_name: Name in Spanish.
item_type: Indicates whether an item is a TV series, a movie, an OVA, an ONA or a Special.
episodes: Number of episodes in a given entry. Some entries may have this as "unknown".
status: Indicates whether the show is Currently Airing, Finished Airing, or Not Aired Yet.
airing_date: Date when show aired.
premier_date: Date when show premiered.
broadcast_date: Days at which show would be broadcasted.
producers: Parties responsible for the management of the anime production.
licensors: Parties responsible for the distribution and servicing of the anime.
studios: Parties responsible for the production of animation.
source: The original material i.e. manga, light novel etc. from which the anime has been adapted.
genres: What genres the anime can be categorized into.
themes: What themes occur within the anime.
demographic: The demographic the anime is marketed to.
duration: Length of an episode.
age_rating: Indicates whether the entry is rated G, PG, PG-13, R, Rx.
description: Description of the anime.
background: Background of the anime.

Schema for manga_entries.csv

id: Unique identifier assigned by the website.
link: URL for the entry.
title_name: Title of the entry.
score: Weighted average of scores given by users.
scored_by: Number of people who scored the entry.
ranked: Ranking of an entry.
popularity: Rank of popularity compared to other mangas with 1st being highest, decided by number of members.
members: Number of users that have interacted with the entry.
favorited: Number of users that have favorited the manga.
synonymns: Other titles by which the manga is referred.
japanese_name: Name in Japanese.
english_name: Name in English.
german_name: Name in German.
french_name: Name in French.
spanish_name: Name in Spanish.
item_type: Indicates whether an item is a Manga, One-shot, Doujinshi, Light-Novel, Novel, Manhwa or Manhua.
volumes: Total number of volumes in the series.
chapters: Total number of chapters in the entire series.
status: Indicates whether the show is Finished, Publishing, On Haitus or Discontinued.
publishing_date: Dates when the manga is being published.
authors: Party responsible for authoring the manga.
serialization: Party responsible for the distribution of the manga.
genres: What genres the manga can be categorized into.
themes: What themes occur within the manga.
demographic: The demographic the manga is marketed to.
description: Description of the manga.
background: Background of the manga.

Important

Columns with lists are json dumps due to their dynamic nature.
The "background" column in both datasets maybe faulty or incomplete due to inconsistent html on the website.
Because of the time it takes to scrape MAL data and the MAL website updating it's entries, some of the data in "ranked" and "popularity" columns may be duplicated or shuffled.
m
SoulWorker - Anime Action MMO Player Activity Dataset
mmo-population.com
csv, json
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MMO Populations, SoulWorker - Anime Action MMO Player Activity Dataset [Dataset]. https://mmo-population.com/game/soulworker-anime-action-mmo
Explore at:
csv, jsonAvailable download formats
Dataset authored and provided by
MMO Populations
License
https://mmo-population.com/termshttps://mmo-population.com/terms
Time period covered
Oct 1, 2023 - Sep 1, 2025
Variables measured
date, index, trend_pct, source_steam, model_version, source_reddit, source_twitch, confidence_pct, players_bridged, players_enhanced, and 1 more
Description
SoulWorker - Anime Action MMO player activity dataset from MMO Populations, combining monthly enhanced players and 30-day daily estimates generated from public signals.
h
open-image-preferences-v1-more-results-binarized
huggingface.co
Updated Dec 19, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rapidata (2024). open-image-preferences-v1-more-results-binarized [Dataset]. https://huggingface.co/datasets/Rapidata/open-image-preferences-v1-more-results-binarized
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 19, 2024
Dataset authored and provided by
Rapidata
Description
We wanted to contribute to the challenge posed by the data-is-better-together community (description below). We collected 170'000 preferences using our API from people all around the world in rougly 3 days (docs.rapidata.ai): If you get value from this dataset and would like to see more in the future, please consider liking it.

Dataset Card for image-preferences-results Original Prompt: Anime-style concept art of a Mayan Quetzalcoatl biomutant, dystopian world… See the full description on the dataset page: https://huggingface.co/datasets/Rapidata/open-image-preferences-v1-more-results-binarized.
anime rating
kaggle.com
zip
Updated Oct 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mariyam Al Shatta (2023). anime rating [Dataset]. https://www.kaggle.com/datasets/mariyamalshatta/anime-rating
Explore at:
zip(1008979 bytes)Available download formats
Dataset updated
Oct 26, 2023
Authors
Mariyam Al Shatta
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Business Context

Streaming media services facilitate on-demand or real-time presentation and distribution of audio, video, and multimedia content across a communications route without downloading the files to their systems. This saves users time and storage, and at the same time provides the media owners with built-in copy protection. In today's digital space, streaming has become an influential medium for accessing information. Improved connectivity and advancement in technology have made streaming services accessible to almost everyone having an internet connection, and the surging demand for on-demand entertainment services such as entertainment programs and live matches is boosting the adoption of streaming media services globally.

Streamist is a streaming company that streams web series and movies to a worldwide audience. Every content on their portal is rated by the viewers, and the portal also provides other information for the content like the number of people who have watched it, the number of people who want to watch it, the number of episodes, duration of an episode, etc.

Objective

Streamist is currently focusing on the anime available in their portal and wants to identify the most important factors involved in rating an anime. As a data scientist at Streamist, you are tasked with analyzing the portal's anime data and identifying the important factors by building a predictive model to predict the rating of an anime.

Data Dictionary

Each record in the database provides a description of an anime. A detailed data dictionary can be found below.

title: title of the anime mediaType: format of publication eps: number of episodes (movies are considered 1 episode) duration: duration of an episode in minutes startYr: the year that airing started finishYr: the year that airing finished description: the synopsis of the plot contentWarn: content warning watched: number of users that completed it watching: number of users that are watching it rating: average user rating votes: number of votes that contribute to the rating studio_primary: studios responsible for creation studios_colab: whether there was a collaboration between studios for anime production genre: genre to which the anime belongs
Anime Ratings
kaggle.com
zip
Updated May 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aphotinel Onagwa (2025). Anime Ratings [Dataset]. https://www.kaggle.com/datasets/aphotinel/anime-ratings/data
Explore at:
zip(22038 bytes)Available download formats
Dataset updated
May 2, 2025
Authors
Aphotinel Onagwa
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset contains information about 1,000 anime, including their title, genre, number of episodes, type (TV, Movie, OVA), user rating, and the number of members who rated each anime. It provides a snapshot of popular anime across different genres and formats, useful for analysis or building recommendation systems.
Top 100 Anime - AnimeList
kaggle.com
zip
Updated Jun 18, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Naman Srivastava (2022). Top 100 Anime - AnimeList [Dataset]. https://www.kaggle.com/datasets/srivnaman/top-100-anime-animelist
Explore at:
zip(3055 bytes)Available download formats
Dataset updated
Jun 18, 2022
Authors
Naman Srivastava
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context People across the globe love to watch anime. It has everything from emotions, drama, romance and mind blowing actions. Among the hundreds of Anime shows/series out there , which is best? **Content ** This dataset contains the information of top 100 anime shows aired on tv according to myanimelist.com
MyAnimeList User Ratings + Anime Dataset
kaggle.com
zip
Updated Dec 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
andrewgatchalian (2023). MyAnimeList User Ratings + Anime Dataset [Dataset]. https://www.kaggle.com/datasets/andrewgatchalian/myanimelist-user-ratings
Explore at:
zip(356047866 bytes)Available download formats
Dataset updated
Dec 8, 2023
Authors
andrewgatchalian
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Context

This dataset contains over 44 million rows of anime-related data, representing interactions from 18,145 unique users and 16,135 unique shows.

Source

The data was sourced directly from the website MyAnimeList, one of the largest anime-driven forums that allows users to track, rate, and review anime.

The dataset contains 4 files: - anime_titles.csv is a list of all available anime titles from MyAnimeList (IDs 1 to ~52,000). The data was pulled using MyAnimeList's API (last updated 11/2023). - anime_user_ratings.csv contains over 40 million ratings from over 18k users, pulled from the MyAnimeList API. Data was cleaned to only include: 'Completed' & 'Dropped' status and non-zero ratings (removed '0' ratings since there was overlap with users who did not rate titles). NOTE: Users are a random sample from over 70k unique usernames we collected. Usernames were web scraped from the MyAnimeList Recently Online Users tab over the span of November 2023. - anime_genres.csv is a file containing dummy variables for all available genres in our dataset (for modeling purposes). - username_list_full.csv is a list of over 70k unique usernames scraped from MyAnimeList.

Instances of '-1' in the data represent information unavailable on MyAnimeList as of 11/2023.

Content

anime_titles.csv - anime_id: MyAnimeList unique number ID - title: full name of anime - mean: average user rating on MyAnimeList (as of 11/2023) - genres: comma separated list of genres - studios: animation studio(s) - synopsis: anime summary - media_type: type of medium (tv, movie, ona) - num_episodes: number of episode(s) per anime

anime_user_ratings.csv - user_id: MyAnimeList username - anime_id: MyAnimeList unique number ID - title: full name of anime - user_status: completion status (only limited to completed or dropped) - user_score: user rating of anime - user_eps_watched: number of episodes watched - user_rewatch: if user is watching show again (Bool) - updated_at: rating time stamp

anime_genres.csv - anime_id: MyAnimeList unique number ID - genre_(*): genre_(name of genre)

username_list_full.csv - username: MyAnimeList username

Acknowledgements

Thanks to: 1. MyAnimeList website and API 2. Cooperunion Kaggle Dataset

Inspiration

To make anime accessible for both old and new viewers!
Anime Subtitles
kaggle.com
zip
Updated Aug 19, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jess Fan (2021). Anime Subtitles [Dataset]. https://www.kaggle.com/datasets/jef1056/anime-subtitles/code
Explore at:
zip(103874640 bytes)Available download formats
Dataset updated
Aug 19, 2021
Authors
Jess Fan
Description
Content

The original extracted versions (in .srt and .ass format) are also included in this release (which, idk why, but kaggle decompressed >:U)

This dataset contains 1,497,770 messages across 3,836 episodes of anime. The raw dataset contains 1,563,442 messages, some of which were removed during cleaning.

This version (V4) adapts the original (frankly, terrible) format into the newer format I developed, which is used in https://github.com/JEF1056/clean-discord. The Dataset folder contains compressed text files, which are compatable with tensorflow datasets. These can be streamed as a textlinedataset in the TSV format.

V4 also fixes many (but not all) issues that the original cleaning script was too simple to realistically take care of. It also uses the clean-discord cleaner algorithms to make sentences more natural language than formatting. The script has also been optimized to run on multi-core systems, allowing it to complete cleaning this entire dataset in under 30 seconds on a 4-core machine. See the new and impoved script here: https://github.com/JEF1056/clean-discord/blob/v1.2/misc/anime.py (no longer bundled in the dataset files)

Format

The files are now all compressed to save space, and are compatable with tensorflow datasets. You can initialize a dataset function as such: def dataset_fn_local(split, shuffle_files=False): global nq_tsv_path del shuffle_files # Load lines from the text file as examples. files_to_read=[os.path.join(nq_tsv_path[split],filename) for filename in os.listdir(nq_tsv_path[split]) if filename.startswith(split)] print(f"~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Split {split} contains {len(files_to_read)} files. First 10: {files_to_read[0:10]}") ds = tf.data.TextLineDataset(files_to_read, compression_type="GZIP").filter(lambda line:tf.not_equal(tf.strings.length(line),0)) ds = ds.shuffle(buffer_size=600000) ds = ds.map(functools.partial(tf.io.decode_csv, record_defaults=["",""], field_delim="\t", use_quote_delim=False), num_parallel_calls=tf.data.experimental.AUTOTUNE) ds = ds.map(lambda *ex: dict(zip(["question", "answer"], ex))) return ds

Acknowledgements

A sincere thanks to all of my friends for helping me come up with anime titles, a shoutout to the talented and dedicated people translating Japanese anime, and an even bigger thanks to Leen Chan for compiling the actual subtitles.

This dataset is far from complete! I hope that people who are willing to find, add and clean the data are out there, and could do their best to try and help out in the effort to grow this data
Full anime list (20k+) in MAL 2023
kaggle.com
zip
Updated Jul 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
crxxom (2023). Full anime list (20k+) in MAL 2023 [Dataset]. https://www.kaggle.com/datasets/crxxom/all-animes-in-mal
Explore at:
zip(4489441 bytes)Available download formats
Dataset updated
Jul 19, 2023
Authors
crxxom
Description
This dataset contains detailed information of over 20k+ anime listed on myanimelist with the following features:

Noted: - Some of the anime in the list are considered as 18+ - All the data in the dataset is scraped from myanimelist.net, feel free to use the dataset as long as it cope with their term of uses

1. title: title of the anime 2. episodes: number of episodes 3. status: whether the anime is still airing or finished airing already 4. theme: the theme of the anime 5. demographic: the demographic of the anime (eg. shonen, shojo, seinen and josei) 6. genres: genres of the anime 7. type: whether the anime is a tv show or movie etc 8. favorites: the number of authenticated users that favorited the anime 9. popularity: the ranking of the anime based on the total members count compare to other anime 10. rank: the ranking of the anime based on the score compare to other anime 11. score: the average score of all authenticated users that made a public vote on the anime 12. members: total number of people that added the anime to their personal anime list (eg. completed, watching, on-hold, dropped) 13. synopsis: plot of the anime 14. aired: when the anime is aired 15. duration: the duration of the anime eg. duration per episode 16. premiered: the season in which the anime is aired 17. studio: the studio that produces the anime
Bilibili Cells at Work
kaggle.com
zip
Updated Jun 7, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sherry (2021). Bilibili Cells at Work [Dataset]. https://www.kaggle.com/sherrytp/bilibili-cells-at-work
Explore at:
zip(7711786 bytes)Available download formats
Dataset updated
Jun 7, 2021
Authors
Sherry
License
https://ec.europa.eu/info/legal-notice_enhttps://ec.europa.eu/info/legal-notice_en
Description
Context

Bilibili Cells at Work [Movie Review Data of A Popular Anime]

Bilibili.com is an Internet company based in Shanghai, China that when IPO on Nasdaq. Many people compare it to Youtube.com but it definitely adds more with the real-time commenting features and could be a representative of young generation of Chinese.

Content

Version 1: Data collected as of May 10, 2019 on Bilibili.com. Version 2: Data collected as of June 6, 2021 on Bilibili.com.

Column Descriptions

author - Author of the review score - Overall score out of 10(i.e. 2 4 6 8 10) disliked - Times of clicking dislike in the past likes - Number of likes reacted this corresponding comment liked - Times of clicking like in the past ctime - N/A content - Review last_ep_index - Last episode watched or on cursor - Cursor Number date - Date when the review is written

** New Change**

star1-5 to calculate a like score: icon-star icon-star-light means the star was lighted, while icon-star means the star was not lighted.

Example

star1 = icon-star icon-star-light and star2-5 = icon-star : score 1 out of 5

star1-5 = icon-star icon-star-light : score 5 out of 5

Acknowledgements

The comment data is scraped from https://www.bilibili.com/bangumi/media/md102392/#short, and contains only short comments (the long comments are more like a long article about viewer's thoughts, so may relate highly to their experience rather than the anime itself). Thanks to Bilibili.com for the copyright of this show and CSDN discussion forum for python scraping assistance.

Inspiration

I hope this dataset serves as an interesting NPL topic in anime reviews and foreign language studying. Free to do any data visualization or text analysis on your own. Don't hesitate to ask me questions on the data or share your interesting idea.
Anime Ratings
kaggle.com
zip
Updated May 12, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ali Ibrahim (2022). Anime Ratings [Dataset]. https://www.kaggle.com/datasets/aliibrahim10/anime-ratings/code
Explore at:
zip(500489 bytes)Available download formats
Dataset updated
May 12, 2022
Authors
Ali Ibrahim
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Note - All data from MyAnimeList (MAL): https://myanimelist.net/

Description

List of all anime on My Anime List database with titles, rankings, ratings and popularity scores. The dataset also includes related genres, if provided on the website, along with number of episodes / episode length and release season.

Many of the titles were not given certain attributes such as genre, release season etc.. so they are given null values. This dataset can be used to practice cleaning the data as a result.

Columns

Title

Name of the anime on the website.

Genres

Genres attributed to the anime, can contain a single genre or multiple separated by commas. NULL if no genre was given to the anime on the website.

Rank

Rank of anime based on the score given. Score calculation explained below in score section.

"Please note that while R18+ entries calculate a weighted score, they are excluded from the rankings."

Popularity

Rank of anime based on the number of people subscribed to it on MyAnimeList.

Score

The score is a weighted number based on the data below.

Weighted Score = (v / (v + m)) * S + (m / (v + m)) * C S = Average score for the anime/manga v = Number users giving a score for the anime/manga † m = Minimum number of scored users required to get a calculated score C = The mean score across the entire Anime/Manga database source: https://myanimelist.net/info.php?go=topanime

Episodes

Total number of episodes in the anime. Ongoing Anime have episode value of "Unknown"

Episode length

Length of each individual episode in the format: x hr. xx min. per episode.

Release date

Season and year of public release of the anime.
MyAnimeList Comment Dataset (MALCoD)
kaggle.com
zip
Updated May 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NatLee (2023). MyAnimeList Comment Dataset (MALCoD) [Dataset]. https://www.kaggle.com/datasets/natlee/myanimelist-comment-dataset/code
Explore at:
zip(151699569 bytes)Available download formats
Dataset updated
May 11, 2023
Authors
NatLee
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
This dataset, comprising over 130K comments from MyAnimeList.net since 2006, was gathered using an open-source crawler program MyAnimeList-Comment-Crawler. Each comment is tagged with ratings from 0 to 10.

Notice that the crawler is fitting the new version of MyAnimeList. The new version of this dataset can be found in MALCoDv2.

Unlike similar datasets like azathoth42/myanimelist and CooperUnion/anime-recommendations-database, this dataset focuses on comments.

Note: All information gathered here are publicly available, there was no need to be registered anywhere to access the data.

Content

animeReviewsOrderByTime.csv - Contains comments up until the first half of 2019 with details like comment ID, Anime work ID, Anime work name, post time, episodes watched at the time of comment, user, number of people who found the comment useful, various ratings, and the comment content.

animeListGenres.csv - Lists over 10K unique Anime works with details like Anime work ID, English name, synonyms of Japanese name, Japanese name, number of episodes, and genres.

animeList.csv - Contains the Anime work ID and its corresponding genres.

reviewsByWork.json - Provides annual summaries, including sentiment classification prediction values trained from another project GURA-gru-unit-for-recognizing-affect.

This dataset can be utilized for sentiment analysis, natural language processing, or to analyze Anime trends and genre popularity. An analysis on trends can be found in the code notebook analysis-trend-and-interest-with-the-tendency.

Acknowledgements

The dataset was independently crawled from MyAnimeList.net.

Citation

N. Lee, "MyAnimeList Comment Dataset (MALCoD)," Kaggle, 2019. [Online]. Available: https://www.kaggle.com/datasets/natlee/myanimelist-comment-dataset.
Death Note
kaggle.com
zip
Updated Jun 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
mgonzalz (2023). Death Note [Dataset]. https://www.kaggle.com/datasets/mariaglezhfhfhhf/death-note
Explore at:
zip(1197 bytes)Available download formats
Dataset updated
Jun 16, 2023
Authors
mgonzalz
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
The Death Note dataset contains relevant information about the anime and manga series, which follows the story of Light Yagami, who falls from the sky at his feet a mysterious notebook, "Death Note". This notebook grants him the power to kill anyone whose name he writes in it, triggering a bunch of 'suicides' in Japan and drawing the attention of the FBI, where L will investigate these events.

The primary purpose of this dataset is to analyze Death Note's popularity, ratings, and narrative structure. Through columns such as "Votes" and "Rating", we can evaluate the reception and the opinion of the viewers in relation to each episode or chapter. This will allow us to identify the most popular episodes and also explore potential ratings trends throughout the series.

The Death Note dataset is a small dataset that is focused on helping people who are just starting out with data analysis to understand it. It is an ideal tool for those who want to practice their data analysis skills using an accessible and relevant data set.
Manhwa dataset
kaggle.com
zip
Updated Jul 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
crxxom (2023). Manhwa dataset [Dataset]. https://www.kaggle.com/datasets/crxxom/manhwa-dataset
Explore at:
zip(735858 bytes)Available download formats
Dataset updated
Jul 11, 2023
Authors
crxxom
Description
This dataset contains all manhwa from MAL, including detailed descriptions and information regarding the manhwa to help you discover underrated manhwa!

Features included in this dataset:

type (all manhwa in this dataset)

title (title of the manhwa)

chapters (number of chapters)

status (whether it's finished or still on-going)

genres

favourites (number of people that favourited the manhwa in mal)

popularity (a popularity ranking)

rank

score (average score of the manhwa)

members (number of members in mal)

synopsis (the description/plot of the manhwa)

volumns (number of volumes)

authors

publish_time

Facebook

Twitter

Click to copy link

Link copied

Cite

Md Yasmi Tohabar Evon (2023). Anime Quest Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/6045074

Anime Quest Dataset

From Classics to Hidden Gems: Anime data about 22,888 anime entries

Explore at:

404 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Unique identifier

https://doi.org/10.34740/kaggle/dsv/6045074

Dataset updated

Jun 28, 2023

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Md Yasmi Tohabar Evon

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Context

This dataset contains information about Anime scraped from Anime Planet on 28/06/2023. It contains information about anime (episodes, aired date, rating, genre, etc.), and favorite anime based on the countries and top countries that watch the most anime.

The scraped program of this dataset is in Anime.Quest GitHub repository.
Tableau visualization of this dataset can also be found in Anime Quest: Visualization.

Content

The dataset contains 3 files:

📁 anime_data.csv: 1. Name: Full name of the anime 2. Media Type: TV, Web, Movie, etc. 3. Episodes: Total episodes of the anime 4. Studio: Name of the studios of the anime, from most recent to oldest. 5. Start Year: Release Year of the anime 6. End Year: Last year of the anime airing 7. Ongoing: Is the anime currently airing or not? True or False. 8. Release Season: Spring, Fall, Winter, and Summer 9. Rating: The global rating ranges from 0 to 5. 10. Rank: Global ranking of the anime 11. Members: Total members of the anime 12. Genre: The category of the anime 13. Creator: Creator of the anime

📁 anime_top_by_country_data.csv: 1. Country: Individual country name 2. Most Popular: The most popular anime in the country 3. 2nd Place: Second-most popular anime in the country 4. 3rd Place: Third-most popular anime in the country 5. 4th Place: Fourth-most popular anime in the country 6. 5th Place: The fifth-most popular anime in the country

📁 anime_watching_data.csv: 1. Rank: Ranking of countries based on the number of anime viewers 2. Country: Individual country name 3. Population: Total population of the country 4. Percentage of People Watching: Percentage of people watching anime in the country 5. Number of People Watching: Total number of people watching anime in the country

Acknowledgements

The website Anime Planet was used to scrape this dataset. Please include citations for this dataset if you use it in your own research.

Inspiration

This dataset can be used to find the factors determining an anime's rating and ranking. Additionally, it can be used to make anime recommendations. The pattern can be observed in anime.

Clear search

Close search

Google apps

Main menu

Anime Quest Dataset

Context

Content

Acknowledgements

Inspiration

Anime Dataset

Join our Community

Top 10000 Anime Movies ,OVA's and Tv-Shows

Context

Content

Acknowledgements

Inspiration

Data from: Self-relevance of anime, sociability, and individual and...

MyAnimeList

Context

Content

Acknowledgements

MyAnimelist Jikan Database

MyAnimeList Anime & Manga Dataset (July 2025)

Description

Schema for anime_entries.csv

Schema for manga_entries.csv

Important

SoulWorker - Anime Action MMO Player Activity Dataset

open-image-preferences-v1-more-results-binarized

anime rating

Anime Ratings

Top 100 Anime - AnimeList

MyAnimeList User Ratings + Anime Dataset

Context

Source

Content

Acknowledgements

Inspiration

Anime Subtitles

Content

Format

Acknowledgements

Full anime list (20k+) in MAL 2023

Bilibili Cells at Work

Context

Content

Column Descriptions

Acknowledgements

Inspiration

Anime Ratings

Description

Columns

Title

Genres

Rank

Popularity

Score

Episodes

Episode length

Release date

MyAnimeList Comment Dataset (MALCoD)

Content

Acknowledgements

Citation

Death Note

Manhwa dataset

Anime Quest Dataset

From Classics to Hidden Gems: Anime data about 22,888 anime entries

Context

Content

Acknowledgements

Inspiration