26 datasets found
  1. Anime Quest Dataset

    • kaggle.com
    Updated Jun 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Md Yasmi Tohabar Evon (2023). Anime Quest Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/6045074
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 28, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Md Yasmi Tohabar Evon
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    This dataset contains information about Anime scraped from Anime Planet on 28/06/2023. It contains information about anime (episodes, aired date, rating, genre, etc.), and favorite anime based on the countries and top countries that watch the most anime.

    Content

    The dataset contains 3 files:

    📁 anime_data.csv: 1. Name: Full name of the anime 2. Media Type: TV, Web, Movie, etc. 3. Episodes: Total episodes of the anime 4. Studio: Name of the studios of the anime, from most recent to oldest. 5. Start Year: Release Year of the anime 6. End Year: Last year of the anime airing 7. Ongoing: Is the anime currently airing or not? True or False. 8. Release Season: Spring, Fall, Winter, and Summer 9. Rating: The global rating ranges from 0 to 5. 10. Rank: Global ranking of the anime 11. Members: Total members of the anime 12. Genre: The category of the anime 13. Creator: Creator of the anime

    📁 anime_top_by_country_data.csv: 1. Country: Individual country name 2. Most Popular: The most popular anime in the country 3. 2nd Place: Second-most popular anime in the country 4. 3rd Place: Third-most popular anime in the country 5. 4th Place: Fourth-most popular anime in the country 6. 5th Place: The fifth-most popular anime in the country

    📁 anime_watching_data.csv: 1. Rank: Ranking of countries based on the number of anime viewers 2. Country: Individual country name 3. Population: Total population of the country 4. Percentage of People Watching: Percentage of people watching anime in the country 5. Number of People Watching: Total number of people watching anime in the country

    Acknowledgements

    The website Anime Planet was used to scrape this dataset. Please include citations for this dataset if you use it in your own research.

    Inspiration

    This dataset can be used to find the factors determining an anime's rating and ranking. Additionally, it can be used to make anime recommendations. The pattern can be observed in anime.

  2. Anime Dataset

    • kaggle.com
    Updated Jul 22, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tarun R Jain (2022). Anime Dataset [Dataset]. https://www.kaggle.com/datasets/tarundalal/anime-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 22, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Tarun R Jain
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Anime Vyuh: A World Of Anime Dataset

    Explore Anime Dataset that consists of Anime data for the last 10 years including the Summer, Winter, and Spring seasons. Columns information: - Anime = Includes Anime Title - Genre = Includes Anime Genre - Description = Synopsis of Anime - Studio = The Animation Studio - Year = Release Year along with date and month - Rating = In terms of stars.

    GitHub Repository contains how the data was created along with EDA and Recommendation System Machine Learning. Do Star it and open a pull request for any suggestions: https://github.com/lucifertrj/AnimeWorldDataset_HUB

    Join our Community

    "https://discord.com/invite/kxZYxdTKp6"> https://discord.com/api/guilds/939520548726272010/widget.png?style=banner1">

    Credits: MyAnimeList is the website from where I scraped the Dataset.

  3. Top 10000 Anime Movies ,OVA's and Tv-Shows

    • kaggle.com
    zip
    Updated Apr 11, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thomas Konstantin (2021). Top 10000 Anime Movies ,OVA's and Tv-Shows [Dataset]. https://www.kaggle.com/thomaskonstantin/top-10000-anime-movies-ovas-and-tvshows
    Explore at:
    zip(2226960 bytes)Available download formats
    Dataset updated
    Apr 11, 2021
    Authors
    Thomas Konstantin
    Description

    Context

    Anime, style of animation popular in Japanese films. Early anime films were intended primarily for the Japanese market and, as such, employed many cultural references unique to Japan. For example, the large eyes of anime characters are commonly perceived in Japan as multifaceted “windows to the soul.” Much of the genre is aimed at children, but anime films are sometimes marked by adult themes and subject matter. Modern anime began in 1956 and found lasting success in 1961 with the establishment of Mushi Productions by Osamu Tezuka, a leading figure in modern manga, the dense, novelistic Japanese comic book style that contributed greatly to the aesthetic of anime. - "britannica.com"

    Content

    The dataset contains information regarding the 10000 most common and known anime series, OVA's and movies. For each entry, the rating, synopsis or description, air dates, and type are recorded.

    Acknowledgements

    All rights and credit for the information are reserved to https://myanimelist.net/ that provide the amazingly detailed information about the animes in this list, I do not own in any way rights for commercial use of this data.

    Inspiration

    Seeing the climbing trend of more and more people starting to watch and enjoy anime, like Netflix, for example, keeps adding more and more anime movies and show to its platform, a natural question to ask is what makes a good anime series? Is there any pattern that makes certain animes better than others? How different are the plots?

  4. j

    Data from: Self-relevance of anime, sociability, and individual and...

    • jstagedata.jst.go.jp
    txt
    Updated Jul 27, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ryohei Kitazawa; Akinori Ono (2023). Self-relevance of anime, sociability, and individual and collective ownerships [Dataset]. http://doi.org/10.50998/data.marketing.23531094.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jul 27, 2023
    Dataset provided by
    Japan Marketing Academy
    Authors
    Ryohei Kitazawa; Akinori Ono
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    These data contain (1) type of manipulated anime consumers (fans/manias/otaku), (2) two variables for the manipulation check, namely (2a) anime relevance to the self, measured by six items, “That anime and I have a lot in common,” “That anime is central to my identity,” “That anime is part of who I am,” “I derived some of my identity from that anime,” “That anime helps me to achieve the identity I wished to have,” “That anime helps me to narrow the gap between what I am and what I try to be” (0-100) and (2b) sociability, measured by five items, “I like to be with people,” “I welcome the opportunity to mix socially with people,” “I prefer working with others rather than alone,” “I find people more stimulating than anything else,” “I'd be unhappy if I were prevented from making many social contacts” (1-7), (3) consumer individual psychological ownership of anime content, measured by four items, “This is MY anime,” “I sense that this anime is MINE,” “I feel a very high degree of personal ownership for this anime,” “When I watch this anime it feels as though I own it” (1-7), and (4) consumer collective psychological ownership of the anime content, which is measured by three items, “Other consumers and I collectively sense that this anime is OURS,” “Other consumers and I collectively feel a very high degree of shared ownership for this anime,” “Most consumers that watch this anime feel as though they own the anime” (1-7).

  5. MyAnimeList

    • kaggle.com
    zip
    Updated Sep 15, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Quang-Vinh Do (2020). MyAnimeList [Dataset]. https://www.kaggle.com/datasets/qvinhdo/myanimelist
    Explore at:
    zip(943068300 bytes)Available download formats
    Dataset updated
    Sep 15, 2020
    Authors
    Quang-Vinh Do
    Description

    Context

    This dataset contains a collection of animes, users, and ratings scraped from MyAnimeList.net using their official API along with the Jikan API. A more detailed dataset of animelist ratings can be found at https://www.kaggle.com/azathoth42/myanimelist, however we just rescraped the animelists again to get more updated information. We did however use the list of user_ids from https://www.kaggle.com/azathoth42/myanimelist to start scraping animelists. This dataset was last updated on September 14, 2020.

    Content

    This dataset contains 4 files. - animes.csv - 17,058 animes containing information on title, anime_id, airing status, number of episodes, and synopsis. - users.csv - 302,674 users containing simple user information like username, gender, location, birthdate and join date. - user_watches.csv - 68,235,827 user animelist ratings with the score and watch status. - mal_db.dump - Dump file of postgresql database containing all 3 csv information above along with proper PK/FK, other constraints, and indexes.

    Note the meanings of certain values For the status for animes: - 1: Currently Airing - 2: Finished Airing - 3: Not yet aired

    For the of status for user_watches ratings: - 1: watching - 2: completed - 3: on hold - 4: dropped - 6: plan to watch

    Acknowledgements

    https://www.kaggle.com/azathoth42/myanimelist for UserList.csv

  6. MyAnimelist Jikan Database

    • kaggle.com
    zip
    Updated Jul 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andreu Vall Hernàndez (2022). MyAnimelist Jikan Database [Dataset]. https://www.kaggle.com/datasets/andreuvallhernndez/myanimelist-jikan
    Explore at:
    zip(43830799 bytes)Available download formats
    Dataset updated
    Jul 23, 2022
    Authors
    Andreu Vall Hernàndez
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Jikan is a PHP & REST API for MyAnimeList. It has two main parts: a PHP library, which has several methods to scrape and parse a lot of data from MyAnimeList's desktop version; and a REST API, which uses the previous PHP library and provides a public API to obtain certain data from MyAnimeList (in json format).

    To avoid overloading MyAnimeList, the REST API uses an internal MongoDB Database to store and cache previously scraped data. Some entries are updated automatically once/day, others only when asked from the REST API. This dataset consists of the scraping of the 4 main collections from the REST API cached database: Animes, Characters, Mangas and People.

    The scraping was done on 17 July 2022 and it took slightly less than 3 hours 30 minutes. The scraping process is really really simple and is uploaded in GitHub.

    It contains the information of: - 24 640 Animes - 146 049 Characters - 66 371 Mangas - 16 943 People

    The cleaning process is a bit longer and it's also explained in the GitHub. Basically it consists in simplifying dictionary columns, adjusting some old values and adding two new columns (nsfw and pending_approval).

    In the near future I'll post a more complete Dataset relating the Characters & Staff with Anime and Manga and the Relations between Animes and Mangas, and I'll be updating that weekly, but that version will have a lot more complicated code and take a lot longer to scrape (over 1 day, and I will scrape too the MyAnimeList official API to know which Animes / Mangas have been updated to update only modified entries), so this is the preliminary and beautifully simple Jikan only version.

    Thanks a lot to Jikan API, studying their API architecture was quite fun, and the scraped data from MyAnimeList is awesome.

  7. MyAnimeList Anime & Manga Dataset (July 2025)

    • kaggle.com
    zip
    Updated Aug 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hamza Ashfaq (2025). MyAnimeList Anime & Manga Dataset (July 2025) [Dataset]. https://www.kaggle.com/datasets/hamzaashfaque1999/myanimelist-scraped-data
    Explore at:
    zip(27845136 bytes)Available download formats
    Dataset updated
    Aug 27, 2025
    Authors
    Hamza Ashfaq
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Description

    Scraped dataset of an anime and manga database known as MyAnimeList.net. Contains two .csv files with 105,983 scraped entries total consisting of 28,635 anime entries and 77,348 manga entries

    Schema for anime_entries.csv

    id: Unique identifier assigned by the website.
    link: URL for the entry.
    title_name: Title of the entry.
    score: Weighted average of scores given by users.
    scored_by: Number of people who scored the entry.
    ranked: Ranking of an entry.
    popularity: Rank of popularity compared to other shows with 1st being highest, decided by number of members.
    members: Number of users that have interacted with the entry.
    favorited: Number of users that have favorited the show.
    synonymns: Other titles by which the show is referred.
    japanese_name: Name in Japanese.
    english_name: Name in English.
    german_name: Name in German.
    french_name: Name in French.
    spanish_name: Name in Spanish.
    item_type: Indicates whether an item is a TV series, a movie, an OVA, an ONA or a Special.
    episodes: Number of episodes in a given entry. Some entries may have this as "unknown".
    status: Indicates whether the show is Currently Airing, Finished Airing, or Not Aired Yet.
    airing_date: Date when show aired.
    premier_date: Date when show premiered.
    broadcast_date: Days at which show would be broadcasted.
    producers: Parties responsible for the management of the anime production.
    licensors: Parties responsible for the distribution and servicing of the anime.
    studios: Parties responsible for the production of animation.
    source: The original material i.e. manga, light novel etc. from which the anime has been adapted.
    genres: What genres the anime can be categorized into.
    themes: What themes occur within the anime.
    demographic: The demographic the anime is marketed to.
    duration: Length of an episode.
    age_rating: Indicates whether the entry is rated G, PG, PG-13, R, Rx.
    description: Description of the anime.
    background: Background of the anime.

    Schema for manga_entries.csv

    id: Unique identifier assigned by the website.
    link: URL for the entry.
    title_name: Title of the entry.
    score: Weighted average of scores given by users.
    scored_by: Number of people who scored the entry.
    ranked: Ranking of an entry.
    popularity: Rank of popularity compared to other mangas with 1st being highest, decided by number of members.
    members: Number of users that have interacted with the entry.
    favorited: Number of users that have favorited the manga.
    synonymns: Other titles by which the manga is referred.
    japanese_name: Name in Japanese.
    english_name: Name in English.
    german_name: Name in German.
    french_name: Name in French.
    spanish_name: Name in Spanish.
    item_type: Indicates whether an item is a Manga, One-shot, Doujinshi, Light-Novel, Novel, Manhwa or Manhua.
    volumes: Total number of volumes in the series.
    chapters: Total number of chapters in the entire series.
    status: Indicates whether the show is Finished, Publishing, On Haitus or Discontinued.
    publishing_date: Dates when the manga is being published.
    authors: Party responsible for authoring the manga.
    serialization: Party responsible for the distribution of the manga.
    genres: What genres the manga can be categorized into.
    themes: What themes occur within the manga.
    demographic: The demographic the manga is marketed to.
    description: Description of the manga.
    background: Background of the manga.

    Important

    Columns with lists are json dumps due to their dynamic nature.
    The "background" column in both datasets maybe faulty or incomplete due to inconsistent html on the website.
    Because of the time it takes to scrape MAL data and the MAL website updating it's entries, some of the data in "ranked" and "popularity" columns may be duplicated or shuffled.

  8. m

    SoulWorker - Anime Action MMO Player Activity Dataset

    • mmo-population.com
    csv, json
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MMO Populations, SoulWorker - Anime Action MMO Player Activity Dataset [Dataset]. https://mmo-population.com/game/soulworker-anime-action-mmo
    Explore at:
    csv, jsonAvailable download formats
    Dataset authored and provided by
    MMO Populations
    License

    https://mmo-population.com/termshttps://mmo-population.com/terms

    Time period covered
    Oct 1, 2023 - Sep 1, 2025
    Variables measured
    date, index, trend_pct, source_steam, model_version, source_reddit, source_twitch, confidence_pct, players_bridged, players_enhanced, and 1 more
    Description

    SoulWorker - Anime Action MMO player activity dataset from MMO Populations, combining monthly enhanced players and 30-day daily estimates generated from public signals.

  9. h

    open-image-preferences-v1-more-results-binarized

    • huggingface.co
    Updated Dec 19, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rapidata (2024). open-image-preferences-v1-more-results-binarized [Dataset]. https://huggingface.co/datasets/Rapidata/open-image-preferences-v1-more-results-binarized
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 19, 2024
    Dataset authored and provided by
    Rapidata
    Description

    We wanted to contribute to the challenge posed by the data-is-better-together community (description below). We collected 170'000 preferences using our API from people all around the world in rougly 3 days (docs.rapidata.ai): If you get value from this dataset and would like to see more in the future, please consider liking it.

      Dataset Card for image-preferences-results Original
    
    
    
    
    
    
    
     Prompt: Anime-style concept art of a Mayan Quetzalcoatl biomutant, dystopian world… See the full description on the dataset page: https://huggingface.co/datasets/Rapidata/open-image-preferences-v1-more-results-binarized.
    
  10. anime rating

    • kaggle.com
    zip
    Updated Oct 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mariyam Al Shatta (2023). anime rating [Dataset]. https://www.kaggle.com/datasets/mariyamalshatta/anime-rating
    Explore at:
    zip(1008979 bytes)Available download formats
    Dataset updated
    Oct 26, 2023
    Authors
    Mariyam Al Shatta
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Business Context

    Streaming media services facilitate on-demand or real-time presentation and distribution of audio, video, and multimedia content across a communications route without downloading the files to their systems. This saves users time and storage, and at the same time provides the media owners with built-in copy protection. In today's digital space, streaming has become an influential medium for accessing information. Improved connectivity and advancement in technology have made streaming services accessible to almost everyone having an internet connection, and the surging demand for on-demand entertainment services such as entertainment programs and live matches is boosting the adoption of streaming media services globally.

    Streamist is a streaming company that streams web series and movies to a worldwide audience. Every content on their portal is rated by the viewers, and the portal also provides other information for the content like the number of people who have watched it, the number of people who want to watch it, the number of episodes, duration of an episode, etc.

    Objective

    Streamist is currently focusing on the anime available in their portal and wants to identify the most important factors involved in rating an anime. As a data scientist at Streamist, you are tasked with analyzing the portal's anime data and identifying the important factors by building a predictive model to predict the rating of an anime.

    Data Dictionary

    Each record in the database provides a description of an anime. A detailed data dictionary can be found below.

    title: title of the anime mediaType: format of publication eps: number of episodes (movies are considered 1 episode) duration: duration of an episode in minutes startYr: the year that airing started finishYr: the year that airing finished description: the synopsis of the plot contentWarn: content warning watched: number of users that completed it watching: number of users that are watching it rating: average user rating votes: number of votes that contribute to the rating studio_primary: studios responsible for creation studios_colab: whether there was a collaboration between studios for anime production genre: genre to which the anime belongs

  11. Anime Ratings

    • kaggle.com
    zip
    Updated May 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aphotinel Onagwa (2025). Anime Ratings [Dataset]. https://www.kaggle.com/datasets/aphotinel/anime-ratings/data
    Explore at:
    zip(22038 bytes)Available download formats
    Dataset updated
    May 2, 2025
    Authors
    Aphotinel Onagwa
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset contains information about 1,000 anime, including their title, genre, number of episodes, type (TV, Movie, OVA), user rating, and the number of members who rated each anime. It provides a snapshot of popular anime across different genres and formats, useful for analysis or building recommendation systems.

  12. Top 100 Anime - AnimeList

    • kaggle.com
    zip
    Updated Jun 18, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Naman Srivastava (2022). Top 100 Anime - AnimeList [Dataset]. https://www.kaggle.com/datasets/srivnaman/top-100-anime-animelist
    Explore at:
    zip(3055 bytes)Available download formats
    Dataset updated
    Jun 18, 2022
    Authors
    Naman Srivastava
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context People across the globe love to watch anime. It has everything from emotions, drama, romance and mind blowing actions. Among the hundreds of Anime shows/series out there , which is best? **Content ** This dataset contains the information of top 100 anime shows aired on tv according to myanimelist.com

  13. MyAnimeList User Ratings + Anime Dataset

    • kaggle.com
    zip
    Updated Dec 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    andrewgatchalian (2023). MyAnimeList User Ratings + Anime Dataset [Dataset]. https://www.kaggle.com/datasets/andrewgatchalian/myanimelist-user-ratings
    Explore at:
    zip(356047866 bytes)Available download formats
    Dataset updated
    Dec 8, 2023
    Authors
    andrewgatchalian
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Context

    This dataset contains over 44 million rows of anime-related data, representing interactions from 18,145 unique users and 16,135 unique shows.

    Source

    The data was sourced directly from the website MyAnimeList, one of the largest anime-driven forums that allows users to track, rate, and review anime.

    The dataset contains 4 files: - anime_titles.csv is a list of all available anime titles from MyAnimeList (IDs 1 to ~52,000). The data was pulled using MyAnimeList's API (last updated 11/2023). - anime_user_ratings.csv contains over 40 million ratings from over 18k users, pulled from the MyAnimeList API. Data was cleaned to only include: 'Completed' & 'Dropped' status and non-zero ratings (removed '0' ratings since there was overlap with users who did not rate titles). NOTE: Users are a random sample from over 70k unique usernames we collected. Usernames were web scraped from the MyAnimeList Recently Online Users tab over the span of November 2023. - anime_genres.csv is a file containing dummy variables for all available genres in our dataset (for modeling purposes). - username_list_full.csv is a list of over 70k unique usernames scraped from MyAnimeList.

    Instances of '-1' in the data represent information unavailable on MyAnimeList as of 11/2023.

    Content

    anime_titles.csv - anime_id: MyAnimeList unique number ID - title: full name of anime - mean: average user rating on MyAnimeList (as of 11/2023) - genres: comma separated list of genres - studios: animation studio(s) - synopsis: anime summary - media_type: type of medium (tv, movie, ona) - num_episodes: number of episode(s) per anime

    anime_user_ratings.csv - user_id: MyAnimeList username - anime_id: MyAnimeList unique number ID - title: full name of anime - user_status: completion status (only limited to completed or dropped) - user_score: user rating of anime - user_eps_watched: number of episodes watched - user_rewatch: if user is watching show again (Bool) - updated_at: rating time stamp

    anime_genres.csv - anime_id: MyAnimeList unique number ID - genre_(*): genre_(name of genre)

    username_list_full.csv - username: MyAnimeList username

    Acknowledgements

    Thanks to: 1. MyAnimeList website and API 2. Cooperunion Kaggle Dataset

    Inspiration

    To make anime accessible for both old and new viewers!

  14. Anime Subtitles

    • kaggle.com
    zip
    Updated Aug 19, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jess Fan (2021). Anime Subtitles [Dataset]. https://www.kaggle.com/datasets/jef1056/anime-subtitles/code
    Explore at:
    zip(103874640 bytes)Available download formats
    Dataset updated
    Aug 19, 2021
    Authors
    Jess Fan
    Description

    Content

    The original extracted versions (in .srt and .ass format) are also included in this release (which, idk why, but kaggle decompressed >:U)

    This dataset contains 1,497,770 messages across 3,836 episodes of anime. The raw dataset contains 1,563,442 messages, some of which were removed during cleaning.

    This version (V4) adapts the original (frankly, terrible) format into the newer format I developed, which is used in https://github.com/JEF1056/clean-discord. The Dataset folder contains compressed text files, which are compatable with tensorflow datasets. These can be streamed as a textlinedataset in the TSV format.

    V4 also fixes many (but not all) issues that the original cleaning script was too simple to realistically take care of. It also uses the clean-discord cleaner algorithms to make sentences more natural language than formatting. The script has also been optimized to run on multi-core systems, allowing it to complete cleaning this entire dataset in under 30 seconds on a 4-core machine. See the new and impoved script here: https://github.com/JEF1056/clean-discord/blob/v1.2/misc/anime.py (no longer bundled in the dataset files)

    Format

    The files are now all compressed to save space, and are compatable with tensorflow datasets. You can initialize a dataset function as such: def dataset_fn_local(split, shuffle_files=False): global nq_tsv_path del shuffle_files # Load lines from the text file as examples. files_to_read=[os.path.join(nq_tsv_path[split],filename) for filename in os.listdir(nq_tsv_path[split]) if filename.startswith(split)] print(f"~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Split {split} contains {len(files_to_read)} files. First 10: {files_to_read[0:10]}") ds = tf.data.TextLineDataset(files_to_read, compression_type="GZIP").filter(lambda line:tf.not_equal(tf.strings.length(line),0)) ds = ds.shuffle(buffer_size=600000) ds = ds.map(functools.partial(tf.io.decode_csv, record_defaults=["",""], field_delim="\t", use_quote_delim=False), num_parallel_calls=tf.data.experimental.AUTOTUNE) ds = ds.map(lambda *ex: dict(zip(["question", "answer"], ex))) return ds

    Acknowledgements

    A sincere thanks to all of my friends for helping me come up with anime titles, a shoutout to the talented and dedicated people translating Japanese anime, and an even bigger thanks to Leen Chan for compiling the actual subtitles.

    This dataset is far from complete! I hope that people who are willing to find, add and clean the data are out there, and could do their best to try and help out in the effort to grow this data

  15. Full anime list (20k+) in MAL 2023

    • kaggle.com
    zip
    Updated Jul 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    crxxom (2023). Full anime list (20k+) in MAL 2023 [Dataset]. https://www.kaggle.com/datasets/crxxom/all-animes-in-mal
    Explore at:
    zip(4489441 bytes)Available download formats
    Dataset updated
    Jul 19, 2023
    Authors
    crxxom
    Description

    This dataset contains detailed information of over 20k+ anime listed on myanimelist with the following features:

    Noted: - Some of the anime in the list are considered as 18+ - All the data in the dataset is scraped from myanimelist.net, feel free to use the dataset as long as it cope with their term of uses

    1. title: title of the anime 2. episodes: number of episodes 3. status: whether the anime is still airing or finished airing already 4. theme: the theme of the anime 5. demographic: the demographic of the anime (eg. shonen, shojo, seinen and josei) 6. genres: genres of the anime 7. type: whether the anime is a tv show or movie etc 8. favorites: the number of authenticated users that favorited the anime 9. popularity: the ranking of the anime based on the total members count compare to other anime 10. rank: the ranking of the anime based on the score compare to other anime 11. score: the average score of all authenticated users that made a public vote on the anime 12. members: total number of people that added the anime to their personal anime list (eg. completed, watching, on-hold, dropped) 13. synopsis: plot of the anime 14. aired: when the anime is aired 15. duration: the duration of the anime eg. duration per episode 16. premiered: the season in which the anime is aired 17. studio: the studio that produces the anime

  16. Bilibili Cells at Work

    • kaggle.com
    zip
    Updated Jun 7, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sherry (2021). Bilibili Cells at Work [Dataset]. https://www.kaggle.com/sherrytp/bilibili-cells-at-work
    Explore at:
    zip(7711786 bytes)Available download formats
    Dataset updated
    Jun 7, 2021
    Authors
    Sherry
    License

    https://ec.europa.eu/info/legal-notice_enhttps://ec.europa.eu/info/legal-notice_en

    Description

    Context

    Bilibili Cells at Work [Movie Review Data of A Popular Anime]

    Bilibili.com is an Internet company based in Shanghai, China that when IPO on Nasdaq. Many people compare it to Youtube.com but it definitely adds more with the real-time commenting features and could be a representative of young generation of Chinese.

    Content

    Version 1: Data collected as of May 10, 2019 on Bilibili.com. Version 2: Data collected as of June 6, 2021 on Bilibili.com.

    Column Descriptions

    author - Author of the review score - Overall score out of 10(i.e. 2 4 6 8 10) disliked - Times of clicking dislike in the past likes - Number of likes reacted this corresponding comment liked - Times of clicking like in the past ctime - N/A content - Review last_ep_index - Last episode watched or on cursor - Cursor Number date - Date when the review is written

    ** New Change**

    star1-5 to calculate a like score: icon-star icon-star-light means the star was lighted, while icon-star means the star was not lighted.

    Example

    star1 = icon-star icon-star-light and star2-5 = icon-star : score 1 out of 5

    star1-5 = icon-star icon-star-light : score 5 out of 5

    Acknowledgements

    The comment data is scraped from https://www.bilibili.com/bangumi/media/md102392/#short, and contains only short comments (the long comments are more like a long article about viewer's thoughts, so may relate highly to their experience rather than the anime itself). Thanks to Bilibili.com for the copyright of this show and CSDN discussion forum for python scraping assistance.

    Inspiration

    I hope this dataset serves as an interesting NPL topic in anime reviews and foreign language studying. Free to do any data visualization or text analysis on your own. Don't hesitate to ask me questions on the data or share your interesting idea.

  17. Anime Ratings

    • kaggle.com
    zip
    Updated May 12, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ali Ibrahim (2022). Anime Ratings [Dataset]. https://www.kaggle.com/datasets/aliibrahim10/anime-ratings/code
    Explore at:
    zip(500489 bytes)Available download formats
    Dataset updated
    May 12, 2022
    Authors
    Ali Ibrahim
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Note - All data from MyAnimeList (MAL): https://myanimelist.net/

    Description

    List of all anime on My Anime List database with titles, rankings, ratings and popularity scores. The dataset also includes related genres, if provided on the website, along with number of episodes / episode length and release season.

    Many of the titles were not given certain attributes such as genre, release season etc.. so they are given null values. This dataset can be used to practice cleaning the data as a result.

    Columns

    Title

    Name of the anime on the website.

    Genres

    Genres attributed to the anime, can contain a single genre or multiple separated by commas. NULL if no genre was given to the anime on the website.

    Rank

    Rank of anime based on the score given. Score calculation explained below in score section.

    "Please note that while R18+ entries calculate a weighted score, they are excluded from the rankings."

    Popularity

    Rank of anime based on the number of people subscribed to it on MyAnimeList.

    Score

    The score is a weighted number based on the data below.

    Weighted Score = (v / (v + m)) * S + (m / (v + m)) * C S = Average score for the anime/manga v = Number users giving a score for the anime/manga † m = Minimum number of scored users required to get a calculated score C = The mean score across the entire Anime/Manga database source: https://myanimelist.net/info.php?go=topanime

    Episodes

    Total number of episodes in the anime. Ongoing Anime have episode value of "Unknown"

    Episode length

    Length of each individual episode in the format: x hr. xx min. per episode.

    Release date

    Season and year of public release of the anime.

  18. MyAnimeList Comment Dataset (MALCoD)

    • kaggle.com
    zip
    Updated May 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NatLee (2023). MyAnimeList Comment Dataset (MALCoD) [Dataset]. https://www.kaggle.com/datasets/natlee/myanimelist-comment-dataset/code
    Explore at:
    zip(151699569 bytes)Available download formats
    Dataset updated
    May 11, 2023
    Authors
    NatLee
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    This dataset, comprising over 130K comments from MyAnimeList.net since 2006, was gathered using an open-source crawler program MyAnimeList-Comment-Crawler. Each comment is tagged with ratings from 0 to 10.

    Notice that the crawler is fitting the new version of MyAnimeList. The new version of this dataset can be found in MALCoDv2.

    Unlike similar datasets like azathoth42/myanimelist and CooperUnion/anime-recommendations-database, this dataset focuses on comments.

    Note: All information gathered here are publicly available, there was no need to be registered anywhere to access the data.

    Content

    1. animeReviewsOrderByTime.csv - Contains comments up until the first half of 2019 with details like comment ID, Anime work ID, Anime work name, post time, episodes watched at the time of comment, user, number of people who found the comment useful, various ratings, and the comment content.

    2. animeListGenres.csv - Lists over 10K unique Anime works with details like Anime work ID, English name, synonyms of Japanese name, Japanese name, number of episodes, and genres.

    3. animeList.csv - Contains the Anime work ID and its corresponding genres.

    4. reviewsByWork.json - Provides annual summaries, including sentiment classification prediction values trained from another project GURA-gru-unit-for-recognizing-affect.

    This dataset can be utilized for sentiment analysis, natural language processing, or to analyze Anime trends and genre popularity. An analysis on trends can be found in the code notebook analysis-trend-and-interest-with-the-tendency.

    Acknowledgements

    The dataset was independently crawled from MyAnimeList.net.

    Citation

    N. Lee, "MyAnimeList Comment Dataset (MALCoD)," Kaggle, 2019. [Online]. Available: https://www.kaggle.com/datasets/natlee/myanimelist-comment-dataset.
    
  19. Death Note

    • kaggle.com
    zip
    Updated Jun 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    mgonzalz (2023). Death Note [Dataset]. https://www.kaggle.com/datasets/mariaglezhfhfhhf/death-note
    Explore at:
    zip(1197 bytes)Available download formats
    Dataset updated
    Jun 16, 2023
    Authors
    mgonzalz
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The Death Note dataset contains relevant information about the anime and manga series, which follows the story of Light Yagami, who falls from the sky at his feet a mysterious notebook, "Death Note". This notebook grants him the power to kill anyone whose name he writes in it, triggering a bunch of 'suicides' in Japan and drawing the attention of the FBI, where L will investigate these events.

    The primary purpose of this dataset is to analyze Death Note's popularity, ratings, and narrative structure. Through columns such as "Votes" and "Rating", we can evaluate the reception and the opinion of the viewers in relation to each episode or chapter. This will allow us to identify the most popular episodes and also explore potential ratings trends throughout the series.

    The Death Note dataset is a small dataset that is focused on helping people who are just starting out with data analysis to understand it. It is an ideal tool for those who want to practice their data analysis skills using an accessible and relevant data set.

  20. Manhwa dataset

    • kaggle.com
    zip
    Updated Jul 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    crxxom (2023). Manhwa dataset [Dataset]. https://www.kaggle.com/datasets/crxxom/manhwa-dataset
    Explore at:
    zip(735858 bytes)Available download formats
    Dataset updated
    Jul 11, 2023
    Authors
    crxxom
    Description

    This dataset contains all manhwa from MAL, including detailed descriptions and information regarding the manhwa to help you discover underrated manhwa!

    Features included in this dataset:

    1. type (all manhwa in this dataset)
    2. title (title of the manhwa)
    3. chapters (number of chapters)
    4. status (whether it's finished or still on-going)
    5. genres
    6. favourites (number of people that favourited the manhwa in mal)
    7. popularity (a popularity ranking)
    8. rank
    9. score (average score of the manhwa)
    10. members (number of members in mal)
    11. synopsis (the description/plot of the manhwa)
    12. volumns (number of volumes)
    13. authors
    14. publish_time
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Md Yasmi Tohabar Evon (2023). Anime Quest Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/6045074
Organization logo

Anime Quest Dataset

From Classics to Hidden Gems: Anime data about 22,888 anime entries

Explore at:
404 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 28, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Md Yasmi Tohabar Evon
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Context

This dataset contains information about Anime scraped from Anime Planet on 28/06/2023. It contains information about anime (episodes, aired date, rating, genre, etc.), and favorite anime based on the countries and top countries that watch the most anime.

Content

The dataset contains 3 files:

📁 anime_data.csv: 1. Name: Full name of the anime 2. Media Type: TV, Web, Movie, etc. 3. Episodes: Total episodes of the anime 4. Studio: Name of the studios of the anime, from most recent to oldest. 5. Start Year: Release Year of the anime 6. End Year: Last year of the anime airing 7. Ongoing: Is the anime currently airing or not? True or False. 8. Release Season: Spring, Fall, Winter, and Summer 9. Rating: The global rating ranges from 0 to 5. 10. Rank: Global ranking of the anime 11. Members: Total members of the anime 12. Genre: The category of the anime 13. Creator: Creator of the anime

📁 anime_top_by_country_data.csv: 1. Country: Individual country name 2. Most Popular: The most popular anime in the country 3. 2nd Place: Second-most popular anime in the country 4. 3rd Place: Third-most popular anime in the country 5. 4th Place: Fourth-most popular anime in the country 6. 5th Place: The fifth-most popular anime in the country

📁 anime_watching_data.csv: 1. Rank: Ranking of countries based on the number of anime viewers 2. Country: Individual country name 3. Population: Total population of the country 4. Percentage of People Watching: Percentage of people watching anime in the country 5. Number of People Watching: Total number of people watching anime in the country

Acknowledgements

The website Anime Planet was used to scrape this dataset. Please include citations for this dataset if you use it in your own research.

Inspiration

This dataset can be used to find the factors determining an anime's rating and ranking. Additionally, it can be used to make anime recommendations. The pattern can be observed in anime.

Search
Clear search
Close search
Google apps
Main menu