100+ datasets found
  1. B

    MoVi: A Large Multipurpose Motion and Video Dataset

    • borealisdata.ca
    • search.dataone.org
    Updated Jun 29, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saeed Ghorbani; Kimia Mahdaviani; Anne Thaler; Konrad Kording; Douglas James Cook; Gunnar Blohm; Nikolaus F. Troje (2021). MoVi: A Large Multipurpose Motion and Video Dataset [Dataset]. http://doi.org/10.5683/SP2/JRHDRN
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 29, 2021
    Dataset provided by
    Borealis
    Authors
    Saeed Ghorbani; Kimia Mahdaviani; Anne Thaler; Konrad Kording; Douglas James Cook; Gunnar Blohm; Nikolaus F. Troje
    License

    https://borealisdata.ca/api/datasets/:persistentId/versions/5.0/customlicense?persistentId=doi:10.5683/SP2/JRHDRNhttps://borealisdata.ca/api/datasets/:persistentId/versions/5.0/customlicense?persistentId=doi:10.5683/SP2/JRHDRN

    Description

    MoVi is the first human motion dataset to contain synchronized pose, pose-dependent shape and video recordings. The MoVi database can be applied in human pose estimation and tracking, human motion prediction and synthesis, action recognition and gait analysis.

  2. P

    MovieQA Dataset

    • paperswithcode.com
    • opendatalab.com
    Updated Feb 7, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Makarand Tapaswi; Yukun Zhu; Rainer Stiefelhagen; Antonio Torralba; Raquel Urtasun; Sanja Fidler (2021). MovieQA Dataset [Dataset]. https://paperswithcode.com/dataset/movieqa
    Explore at:
    Dataset updated
    Feb 7, 2021
    Authors
    Makarand Tapaswi; Yukun Zhu; Rainer Stiefelhagen; Antonio Torralba; Raquel Urtasun; Sanja Fidler
    Description

    The MovieQA dataset is a dataset for movie question answering. to evaluate automatic story comprehension from both video and text. The data set consists of almost 15,000 multiple choice question answers obtained from over 400 movies and features high semantic diversity. Each question comes with a set of five highly plausible answers; only one of which is correct. The questions can be answered using multiple sources of information: movie clips, plots, subtitles, and for a subset scripts and DVS.

  3. h

    movies-dataset

    • huggingface.co
    Updated Mar 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    sychonix (2025). movies-dataset [Dataset]. https://huggingface.co/datasets/sychonix/movies-dataset
    Explore at:
    Dataset updated
    Mar 27, 2025
    Authors
    sychonix
    Description

    sychonix/movies-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community

  4. IMDB 5000 Movie Dataset

    • kaggle.com
    zip
    Updated Dec 16, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yueming (2017). IMDB 5000 Movie Dataset [Dataset]. https://www.kaggle.com/datasets/carolzhangdc/imdb-5000-movie-dataset/code?datasetId=7181&sortBy=voteCount
    Explore at:
    zip(567524 bytes)Available download formats
    Dataset updated
    Dec 16, 2017
    Authors
    Yueming
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    Dataset

    This dataset was created by Yueming

    Released under Database: Open Database, Contents: Database Contents

    Contents

  5. Data from: Video Recommendations Based on Visual Features Extracted with...

    • zenodo.org
    • data.niaid.nih.gov
    bin, zip
    Updated Jun 2, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tord Kvifte; Tord Kvifte (2021). Video Recommendations Based on Visual Features Extracted with Deep Learning [Dataset]. http://doi.org/10.5281/zenodo.4889729
    Explore at:
    bin, zipAvailable download formats
    Dataset updated
    Jun 2, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Tord Kvifte; Tord Kvifte
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset contains visual features extracted from 12875 movie trailers. The visual features are extracted from key-frames of movie trailers with the VGG-19 CNN, pre-trained on ImageNet.

    Movies in the datset are identified by their MovieLens movieId.

    • Features_sparse.zip contains the 4096-dimensional feature vectors of each key-frame from every movie.
    • Visual labels.zip contains the1000 dimensional label feature vectors of each key-frame from every movie.
    • DeepCineProp-f.p has combined the label features of each movie into a vector space model with the use of tf-idf.
    • CineSub.p contains the subtitles of each movie represented in a vector space model pre-processed with various nlp techniques and produced using tf-idf.

    Abstract:


    When a movie is uploaded to a movie Recommender System (e.g., YouTube), the system can exploit various forms of descriptive features (e.g., tags and genre) in order to generate personalized recommendation for users. However, there are situations where the descriptive features are missing or very limited and the system may fail to include such a movie in the recommendation list, known as Cold-start problem. This thesis investigates recommendation based on a novel form of content features, extracted from movies, in order to generate recommendation for users. Such features represent the visual aspects of movies, based on Deep Learning models, and hence, do not require any human annotation when extracted. The proposed technique has been evaluated in both offline and online evaluations using a large dataset of movies. The online evaluation has been carried out in a evaluation framework developed for this thesis. Results from the offline and online evaluation (N=150) show that automatically extracted visual features can mitigate the cold-start problem by generating recommendation with a superior quality compared to different baselines, including recommendation based on human-annotated features. The results also point to subtitles as a high-quality future source of automatically extracted features.

  6. s

    Moviegalaxies – Social Networks in Movies

    • marketplace.sshopencloud.eu
    • dataverse.harvard.edu
    • +1more
    Updated Feb 11, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Moviegalaxies – Social Networks in Movies [Dataset]. http://doi.org/10.7910/DVN/T4HBA3
    Explore at:
    Dataset updated
    Feb 11, 2022
    Description

    This repository contains network graphs and network metadata from Moviegalaxies, a website providing network graph data from about 773 films (1915–2012). The data includes individual network graph data in Graph Exchange XML Format and descriptive statistics on measures such as clustering coefficient, degree, density, diameter, modularity, average path length, the total number of edges, and the total number of nodes.

  7. H

    Replication Data for: Movie Scripts Corpus

    • dataverse.harvard.edu
    • search.dataone.org
    • +1more
    Updated May 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lance Drouet (2024). Replication Data for: Movie Scripts Corpus [Dataset]. http://doi.org/10.7910/DVN/PZTL2L
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 6, 2024
    Dataset provided by
    Harvard Dataverse
    Authors
    Lance Drouet
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Data Source: https://www.kaggle.com/datasets/gufukuro/movie-scripts-corpus Data Description : Movie Scripts Corpus This corpus was collected to use for screenplay analysis with machine learning methods. Corpus includes movie scripts, crawled from different sources, their annotations by script structural elements and movies metadata. Corpus description Screenplay data consists of: Movie scripts TXT-documents with raw full text (2858 docs) Movie scripts TXT-documents with full text lemmas (2858 docs) Manual annotation TXT-documents for some movie scripts (33 docs, more than 6000 annotated rows) Movie scripts annotations TXT-documents obtained by BERT Movie scripts annotations json-documents obtained by rule-based annotator ScreenPy Movies metadata consists of: Cut versions of movie reviews and scores from metacritic: Number of reviews: 21025 Number of movies with reviews: 2038 Metadata for movies, including: title, akas, launch year, score from metacritic, imdb user rating and number of votes from imdb.com, movie awards, opening weekend, producers, budget, script department, production companies, writers, directors, cast info, countries involved in production, age restrict, plot (with outline), keywords, genres, taglines, critics' synopsis Screenplay awards information: Academy Awards adapted screenplay, Academy Awards original screenplay, BAFTA, Golden Globe Award for Best Screenplay, Writers Guild Awards Winners & Nominees 2020-2013 nominations information for 462 movies in total. Movie characters data consists of: Script text fragments with dialogs and scene descriptions for characters, gathered with annotators: 2153 movies and text fragments for 32114 characters in total Gender labels for 4792 characters

  8. P

    CMU Movie Summary Corpus Dataset

    • paperswithcode.com
    • opendatalab.com
    Updated May 7, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Bamman; Brendan O{'}Connor; Noah A. Smith (2013). CMU Movie Summary Corpus Dataset [Dataset]. https://paperswithcode.com/dataset/cmu-movie-summary-corpus
    Explore at:
    Dataset updated
    May 7, 2013
    Authors
    David Bamman; Brendan O{'}Connor; Noah A. Smith
    Description

    Dataset [46 M] and readme: 42,306 movie plot summaries extracted from Wikipedia + aligned metadata extracted from Freebase, including: Movie box office revenue, genre, release date, runtime, and language Character names and aligned information about the actors who portray them, including gender and estimated age at the time of the movie's release Supplement: Stanford CoreNLP-processed summaries [628 M]. All of the plot summaries from above, run through the Stanford CoreNLP pipeline (tagging, parsing, NER and coref).

  9. Z

    Film Circulation dataset

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jul 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Samoilova, Evgenia (Zhenya) (2024). Film Circulation dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7887671
    Explore at:
    Dataset updated
    Jul 12, 2024
    Dataset provided by
    Samoilova, Evgenia (Zhenya)
    Loist, Skadi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Complete dataset of “Film Circulation on the International Film Festival Network and the Impact on Global Film Culture”

    A peer-reviewed data paper for this dataset is in review to be published in NECSUS_European Journal of Media Studies - an open access journal aiming at enhancing data transparency and reusability, and will be available from https://necsus-ejms.org/ and https://mediarep.org

    Please cite this when using the dataset.

    Detailed description of the dataset:

    1 Film Dataset: Festival Programs

    The Film Dataset consists a data scheme image file, a codebook and two dataset tables in csv format.

    The codebook (csv file “1_codebook_film-dataset_festival-program”) offers a detailed description of all variables within the Film Dataset. Along with the definition of variables it lists explanations for the units of measurement, data sources, coding and information on missing data.

    The csv file “1_film-dataset_festival-program_long” comprises a dataset of all films and the festivals, festival sections, and the year of the festival edition that they were sampled from. The dataset is structured in the long format, i.e. the same film can appear in several rows when it appeared in more than one sample festival. However, films are identifiable via their unique ID.

    The csv file “1_film-dataset_festival-program_wide” consists of the dataset listing only unique films (n=9,348). The dataset is in the wide format, i.e. each row corresponds to a unique film, identifiable via its unique ID. For easy analysis, and since the overlap is only six percent, in this dataset the variable sample festival (fest) corresponds to the first sample festival where the film appeared. For instance, if a film was first shown at Berlinale (in February) and then at Frameline (in June of the same year), the sample festival will list “Berlinale”. This file includes information on unique and IMDb IDs, the film title, production year, length, categorization in length, production countries, regional attribution, director names, genre attribution, the festival, festival section and festival edition the film was sampled from, and information whether there is festival run information available through the IMDb data.

    2 Survey Dataset

    The Survey Dataset consists of a data scheme image file, a codebook and two dataset tables in csv format.

    The codebook “2_codebook_survey-dataset” includes coding information for both survey datasets. It lists the definition of the variables or survey questions (corresponding to Samoilova/Loist 2019), units of measurement, data source, variable type, range and coding, and information on missing data.

    The csv file “2_survey-dataset_long-festivals_shared-consent” consists of a subset (n=161) of the original survey dataset (n=454), where respondents provided festival run data for films (n=206) and gave consent to share their data for research purposes. This dataset consists of the festival data in a long format, so that each row corresponds to the festival appearance of a film.

    The csv file “2_survey-dataset_wide-no-festivals_shared-consent” consists of a subset (n=372) of the original dataset (n=454) of survey responses corresponding to sample films. It includes data only for those films for which respondents provided consent to share their data for research purposes. This dataset is shown in wide format of the survey data, i.e. information for each response corresponding to a film is listed in one row. This includes data on film IDs, film title, survey questions regarding completeness and availability of provided information, information on number of festival screenings, screening fees, budgets, marketing costs, market screenings, and distribution. As the file name suggests, no data on festival screenings is included in the wide format dataset.

    3 IMDb & Scripts

    The IMDb dataset consists of a data scheme image file, one codebook and eight datasets, all in csv format. It also includes the R scripts that we used for scraping and matching.

    The codebook “3_codebook_imdb-dataset” includes information for all IMDb datasets. This includes ID information and their data source, coding and value ranges, and information on missing data.

    The csv file “3_imdb-dataset_aka-titles_long” contains film title data in different languages scraped from IMDb in a long format, i.e. each row corresponds to a title in a given language.

    The csv file “3_imdb-dataset_awards_long” contains film award data in a long format, i.e. each row corresponds to an award of a given film.

    The csv file “3_imdb-dataset_companies_long” contains data on production and distribution companies of films. The dataset is in a long format, so that each row corresponds to a particular company of a particular film.

    The csv file “3_imdb-dataset_crew_long” contains data on names and roles of crew members in a long format, i.e. each row corresponds to each crew member. The file also contains binary gender assigned to directors based on their first names using the GenderizeR application.

    The csv file “3_imdb-dataset_festival-runs_long” contains festival run data scraped from IMDb in a long format, i.e. each row corresponds to the festival appearance of a given film. The dataset does not include each film screening, but the first screening of a film at a festival within a given year. The data includes festival runs up to 2019.

    The csv file “3_imdb-dataset_general-info_wide” contains general information about films such as genre as defined by IMDb, languages in which a film was shown, ratings, and budget. The dataset is in wide format, so that each row corresponds to a unique film.

    The csv file “3_imdb-dataset_release-info_long” contains data about non-festival release (e.g., theatrical, digital, tv, dvd/blueray). The dataset is in a long format, so that each row corresponds to a particular release of a particular film.

    The csv file “3_imdb-dataset_websites_long” contains data on available websites (official websites, miscellaneous, photos, video clips). The dataset is in a long format, so that each row corresponds to a website of a particular film.

    The dataset includes 8 text files containing the script for webscraping. They were written using the R-3.6.3 version for Windows.

    The R script “r_1_unite_data” demonstrates the structure of the dataset, that we use in the following steps to identify, scrape, and match the film data.

    The R script “r_2_scrape_matches” reads in the dataset with the film characteristics described in the “r_1_unite_data” and uses various R packages to create a search URL for each film from the core dataset on the IMDb website. The script attempts to match each film from the core dataset to IMDb records by first conducting an advanced search based on the movie title and year, and then potentially using an alternative title and a basic search if no matches are found in the advanced search. The script scrapes the title, release year, directors, running time, genre, and IMDb film URL from the first page of the suggested records from the IMDb website. The script then defines a loop that matches (including matching scores) each film in the core dataset with suggested films on the IMDb search page. Matching was done using data on directors, production year (+/- one year), and title, a fuzzy matching approach with two methods: “cosine” and “osa.” where the cosine similarity is used to match titles with a high degree of similarity, and the OSA algorithm is used to match titles that may have typos or minor variations.

    The script “r_3_matching” creates a dataset with the matches for a manual check. Each pair of films (original film from the core dataset and the suggested match from the IMDb website was categorized in the following five categories: a) 100% match: perfect match on title, year, and director; b) likely good match; c) maybe match; d) unlikely match; and e) no match). The script also checks for possible doubles in the dataset and identifies them for a manual check.

    The script “r_4_scraping_functions” creates a function for scraping the data from the identified matches (based on the scripts described above and manually checked). These functions are used for scraping the data in the next script.

    The script “r_5a_extracting_info_sample” uses the function defined in the “r_4_scraping_functions”, in order to scrape the IMDb data for the identified matches. This script does that for the first 100 films, to check, if everything works. Scraping for the entire dataset took a few hours. Therefore, a test with a subsample of 100 films is advisable.

    The script “r_5b_extracting_info_all” extracts the data for the entire dataset of the identified matches.

    The script “r_5c_extracting_info_skipped” checks the films with missing data (where data was not scraped) and tried to extract data one more time to make sure that the errors were not caused by disruptions in the internet connection or other technical issues.

    The script “r_check_logs” is used for troubleshooting and tracking the progress of all of the R scripts used. It gives information on the amount of missing values and errors.

    4 Festival Library Dataset

    The Festival Library Dataset consists of a data scheme image file, one codebook and one dataset, all in csv format.

    The codebook (csv file “4_codebook_festival-library_dataset”) offers a detailed description of all variables within the Library Dataset. It lists the definition of variables, such as location and festival name, and festival categories, units of measurement, data sources and coding and missing data.

    The csv file “4_festival-library_dataset_imdb-and-survey” contains data on all unique festivals collected from both IMDb and survey sources. This dataset appears in wide format, all information for each festival is listed in one row. This

  10. movie lens 1 million

    • kaggle.com
    zip
    Updated Jul 20, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rushikesh Wayal (2020). movie lens 1 million [Dataset]. https://www.kaggle.com/datasets/luffyluffyluffy/movie-lens-1-million
    Explore at:
    zip(6111648 bytes)Available download formats
    Dataset updated
    Jul 20, 2020
    Authors
    Rushikesh Wayal
    Description

    Dataset

    This dataset was created by Rushikesh Wayal

    Contents

  11. f

    MOVIES DATABASE: COLLECTING SCENES FOR LOCATIONS AND MATERIALITY CHAPTERS

    • uvaauas.figshare.com
    • figshare.com
    zip
    Updated Jun 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    J. Martin Alonso (2024). MOVIES DATABASE: COLLECTING SCENES FOR LOCATIONS AND MATERIALITY CHAPTERS [Dataset]. http://doi.org/10.21942/uva.25930837.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 21, 2024
    Dataset provided by
    University of Amsterdam / Amsterdam University of Applied Sciences
    Authors
    J. Martin Alonso
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Excel database encompassing scenes from each of the movies included in the research for the thesis regarding locations and objects depicted in them.

  12. T

    imdb_reviews

    • tensorflow.org
    Updated Sep 20, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). imdb_reviews [Dataset]. https://www.tensorflow.org/datasets/catalog/imdb_reviews
    Explore at:
    Dataset updated
    Sep 20, 2024
    Description

    Large Movie Review Dataset. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well.

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('imdb_reviews', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

  13. g

    MovieLens 1M

    • grouplens.org
    • kaggle.com
    Updated Mar 19, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2016). MovieLens 1M [Dataset]. https://grouplens.org/datasets/movielens/1m/
    Explore at:
    Dataset updated
    Mar 19, 2016
    Description

    Stable benchmark dataset. 1 million ratings from 6000 users on 4000 movies. Released 2/2003.

  14. h

    movie-posters

    • huggingface.co
    Updated Apr 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pinecone (2023). movie-posters [Dataset]. https://huggingface.co/datasets/pinecone/movie-posters
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 20, 2023
    Dataset authored and provided by
    Pinecone
    Description

    pinecone/movie-posters dataset hosted on Hugging Face and contributed by the HF Datasets community

  15. m

    Bollywood Movies data

    • data.mendeley.com
    Updated May 12, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bollywood Movies data [Dataset]. https://data.mendeley.com/datasets/3c57btcxy9/1
    Explore at:
    Dataset updated
    May 12, 2020
    Authors
    Prashant Premkumar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Using a Python script to scrape data from the web, we collected data pertaining to all 1698 Hindi language movies that released in India across a 13 year period (2005-2017) from the website of Box Office India.

  16. P

    MovieLens Dataset

    • paperswithcode.com
    Updated Feb 7, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    F. Maxwell Harper; Joseph A. Konstan (2021). MovieLens Dataset [Dataset]. https://paperswithcode.com/dataset/movielens
    Explore at:
    Dataset updated
    Feb 7, 2021
    Authors
    F. Maxwell Harper; Joseph A. Konstan
    Description

    The MovieLens datasets, first released in 1998, describe people’s expressed preferences for movies. These preferences take the form of tuples, each the result of a person expressing a preference (a 0-5 star rating) for a movie at a particular time. These preferences were entered by way of the MovieLens web site1 — a recommender system that asks its users to give movie ratings in order to receive personalized movie recommendations.

  17. P

    Data from: MDD Dataset

    • paperswithcode.com
    Updated Jan 27, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jesse Dodge; Andreea Gane; Xiang Zhang; Antoine Bordes; Sumit Chopra; Alexander Miller; Arthur Szlam; Jason Weston (2021). MDD Dataset [Dataset]. https://paperswithcode.com/dataset/mdd
    Explore at:
    Dataset updated
    Jan 27, 2021
    Authors
    Jesse Dodge; Andreea Gane; Xiang Zhang; Antoine Bordes; Sumit Chopra; Alexander Miller; Arthur Szlam; Jason Weston
    Description

    Movie Dialog dataset (MDD) is designed to measure how well models can perform at goal and non-goal orientated dialog centered around the topic of movies (question answering, recommendation and discussion).

  18. T

    movielens

    • tensorflow.org
    • opendatalab.com
    • +1more
    Updated Jul 8, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2020). movielens [Dataset]. https://www.tensorflow.org/datasets/catalog/movielens
    Explore at:
    Dataset updated
    Jul 8, 2020
    Description

    This dataset contains a set of movie ratings from the MovieLens website, a movie recommendation service. This dataset was collected and maintained by GroupLens, a research group at the University of Minnesota. There are 5 versions included: "25m", "latest-small", "100k", "1m", "20m". In all datasets, the movies data and ratings data are joined on "movieId". The 25m dataset, latest-small dataset, and 20m dataset contain only movie data and rating data. The 1m dataset and 100k dataset contain demographic data in addition to movie and rating data.

    • "25m": This is the latest stable version of the MovieLens dataset. It is recommended for research purposes.
    • "latest-small": This is a small subset of the latest version of the MovieLens dataset. It is changed and updated over time by GroupLens.
    • "100k": This is the oldest version of the MovieLens datasets. It is a small dataset with demographic data.
    • "1m": This is the largest MovieLens dataset that contains demographic data.
    • "20m": This is one of the most used MovieLens datasets in academic papers along with the 1m dataset.

    For each version, users can view either only the movies data by adding the "-movies" suffix (e.g. "25m-movies") or the ratings data joined with the movies data (and users data in the 1m and 100k datasets) by adding the "-ratings" suffix (e.g. "25m-ratings").

    The features below are included in all versions with the "-ratings" suffix.

    • "movie_id": a unique identifier of the rated movie
    • "movie_title": the title of the rated movie with the release year in parentheses
    • "movie_genres": a sequence of genres to which the rated movie belongs
    • "user_id": a unique identifier of the user who made the rating
    • "user_rating": the score of the rating on a five-star scale
    • "timestamp": the timestamp of the ratings, represented in seconds since midnight Coordinated Universal Time (UTC) of January 1, 1970

    The "100k-ratings" and "1m-ratings" versions in addition include the following demographic features.

    • "user_gender": gender of the user who made the rating; a true value corresponds to male
    • "bucketized_user_age": bucketized age values of the user who made the rating, the values and the corresponding ranges are:
      • 1: "Under 18"
      • 18: "18-24"
      • 25: "25-34"
      • 35: "35-44"
      • 45: "45-49"
      • 50: "50-55"
      • 56: "56+"
    • "user_occupation_label": the occupation of the user who made the rating represented by an integer-encoded label; labels are preprocessed to be consistent across different versions
    • "user_occupation_text": the occupation of the user who made the rating in the original string; different versions can have different set of raw text labels
    • "user_zip_code": the zip code of the user who made the rating

    In addition, the "100k-ratings" dataset would also have a feature "raw_user_age" which is the exact ages of the users who made the rating

    Datasets with the "-movies" suffix contain only "movie_id", "movie_title", and "movie_genres" features.

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('movielens', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

  19. d

    Film Permits

    • catalog.data.gov
    • data.cityofnewyork.us
    • +4more
    Updated Mar 22, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.cityofnewyork.us (2025). Film Permits [Dataset]. https://catalog.data.gov/dataset/film-permits
    Explore at:
    Dataset updated
    Mar 22, 2025
    Dataset provided by
    data.cityofnewyork.us
    Description

    Permits are generally required when asserting the exclusive use of city property, like a sidewalk, a street, or a park. See http://www1.nyc.gov/site/mome/permits/when-permit-required.page

  20. Movie content release strategies worldwide 2024

    • statista.com
    Updated Nov 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Movie content release strategies worldwide 2024 [Dataset]. https://www.statista.com/statistics/1464311/movie-content-release-strategies/
    Explore at:
    Dataset updated
    Nov 9, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Dec 2023 - Jan 2024
    Area covered
    Worldwide
    Description

    According to a survey done between December 2023 and January 2024, 83 percent of media insiders agreed with the statement that movies will be available on a premium VOD shortly after theaters. Similarly, 76 percent of respondents stated that windows between theaters and first pay premiere will be shorter. In contrast, over half of respondents disagreed with the thought that studios will increasingly release movies on direct-to-consumer SVOD services simultaneously with theaters.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Saeed Ghorbani; Kimia Mahdaviani; Anne Thaler; Konrad Kording; Douglas James Cook; Gunnar Blohm; Nikolaus F. Troje (2021). MoVi: A Large Multipurpose Motion and Video Dataset [Dataset]. http://doi.org/10.5683/SP2/JRHDRN

MoVi: A Large Multipurpose Motion and Video Dataset

Related Article
Explore at:
48 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 29, 2021
Dataset provided by
Borealis
Authors
Saeed Ghorbani; Kimia Mahdaviani; Anne Thaler; Konrad Kording; Douglas James Cook; Gunnar Blohm; Nikolaus F. Troje
License

https://borealisdata.ca/api/datasets/:persistentId/versions/5.0/customlicense?persistentId=doi:10.5683/SP2/JRHDRNhttps://borealisdata.ca/api/datasets/:persistentId/versions/5.0/customlicense?persistentId=doi:10.5683/SP2/JRHDRN

Description

MoVi is the first human motion dataset to contain synchronized pose, pose-dependent shape and video recordings. The MoVi database can be applied in human pose estimation and tracking, human motion prediction and synthesis, action recognition and gait analysis.

Search
Clear search
Close search
Google apps
Main menu