100+ datasets found

B
MoVi: A Large Multipurpose Motion and Video Dataset
borealisdata.ca
search.dataone.org
Updated Jun 29, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Saeed Ghorbani; Kimia Mahdaviani; Anne Thaler; Konrad Kording; Douglas James Cook; Gunnar Blohm; Nikolaus F. Troje (2021). MoVi: A Large Multipurpose Motion and Video Dataset [Dataset]. http://doi.org/10.5683/SP2/JRHDRN
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.5683/SP2/JRHDRN
Dataset updated
Jun 29, 2021
Dataset provided by
Borealis
Authors
Saeed Ghorbani; Kimia Mahdaviani; Anne Thaler; Konrad Kording; Douglas James Cook; Gunnar Blohm; Nikolaus F. Troje
License
https://borealisdata.ca/api/datasets/:persistentId/versions/5.0/customlicense?persistentId=doi:10.5683/SP2/JRHDRNhttps://borealisdata.ca/api/datasets/:persistentId/versions/5.0/customlicense?persistentId=doi:10.5683/SP2/JRHDRN
Description
MoVi is the first human motion dataset to contain synchronized pose, pose-dependent shape and video recordings. The MoVi database can be applied in human pose estimation and tracking, human motion prediction and synthesis, action recognition and gait analysis.
P
MovieQA Dataset
paperswithcode.com
opendatalab.com
Updated Feb 7, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Makarand Tapaswi; Yukun Zhu; Rainer Stiefelhagen; Antonio Torralba; Raquel Urtasun; Sanja Fidler (2021). MovieQA Dataset [Dataset]. https://paperswithcode.com/dataset/movieqa
Explore at:
Dataset updated
Feb 7, 2021
Authors
Makarand Tapaswi; Yukun Zhu; Rainer Stiefelhagen; Antonio Torralba; Raquel Urtasun; Sanja Fidler
Description
The MovieQA dataset is a dataset for movie question answering. to evaluate automatic story comprehension from both video and text. The data set consists of almost 15,000 multiple choice question answers obtained from over 400 movies and features high semantic diversity. Each question comes with a set of five highly plausible answers; only one of which is correct. The questions can be answered using multiple sources of information: movie clips, plots, subtitles, and for a subset scripts and DVS.
h
movies-dataset
huggingface.co
Updated Mar 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
sychonix (2025). movies-dataset [Dataset]. https://huggingface.co/datasets/sychonix/movies-dataset
Explore at:
Dataset updated
Mar 27, 2025
Authors
sychonix
Description
sychonix/movies-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
IMDB 5000 Movie Dataset
kaggle.com
zip
Updated Dec 16, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yueming (2017). IMDB 5000 Movie Dataset [Dataset]. https://www.kaggle.com/datasets/carolzhangdc/imdb-5000-movie-dataset/code?datasetId=7181&sortBy=voteCount
Explore at:
zip(567524 bytes)Available download formats
Dataset updated
Dec 16, 2017
Authors
Yueming
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
Dataset

This dataset was created by Yueming

Released under Database: Open Database, Contents: Database Contents

Contents
Data from: Video Recommendations Based on Visual Features Extracted with...
zenodo.org
data.niaid.nih.gov
bin, zip
Updated Jun 2, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tord Kvifte; Tord Kvifte (2021). Video Recommendations Based on Visual Features Extracted with Deep Learning [Dataset]. http://doi.org/10.5281/zenodo.4889729
Explore at:
bin, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4889729
Dataset updated
Jun 2, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Tord Kvifte; Tord Kvifte
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset contains visual features extracted from 12875 movie trailers. The visual features are extracted from key-frames of movie trailers with the VGG-19 CNN, pre-trained on ImageNet.

Movies in the datset are identified by their MovieLens movieId.

Features_sparse.zip contains the 4096-dimensional feature vectors of each key-frame from every movie.

Visual labels.zip contains the1000 dimensional label feature vectors of each key-frame from every movie.

DeepCineProp-f.p has combined the label features of each movie into a vector space model with the use of tf-idf.

CineSub.p contains the subtitles of each movie represented in a vector space model pre-processed with various nlp techniques and produced using tf-idf.

Abstract:

When a movie is uploaded to a movie Recommender System (e.g., YouTube), the system can exploit various forms of descriptive features (e.g., tags and genre) in order to generate personalized recommendation for users. However, there are situations where the descriptive features are missing or very limited and the system may fail to include such a movie in the recommendation list, known as Cold-start problem. This thesis investigates recommendation based on a novel form of content features, extracted from movies, in order to generate recommendation for users. Such features represent the visual aspects of movies, based on Deep Learning models, and hence, do not require any human annotation when extracted. The proposed technique has been evaluated in both offline and online evaluations using a large dataset of movies. The online evaluation has been carried out in a evaluation framework developed for this thesis. Results from the offline and online evaluation (N=150) show that automatically extracted visual features can mitigate the cold-start problem by generating recommendation with a superior quality compared to different baselines, including recommendation based on human-annotated features. The results also point to subtitles as a high-quality future source of automatically extracted features.
s
Moviegalaxies – Social Networks in Movies
marketplace.sshopencloud.eu
dataverse.harvard.edu
+1more
Updated Feb 11, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Moviegalaxies – Social Networks in Movies [Dataset]. http://doi.org/10.7910/DVN/T4HBA3
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/T4HBA3
Dataset updated
Feb 11, 2022
Description
This repository contains network graphs and network metadata from Moviegalaxies, a website providing network graph data from about 773 films (1915–2012). The data includes individual network graph data in Graph Exchange XML Format and descriptive statistics on measures such as clustering coefficient, degree, density, diameter, modularity, average path length, the total number of edges, and the total number of nodes.
H
Replication Data for: Movie Scripts Corpus
dataverse.harvard.edu
search.dataone.org
+1more
Updated May 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lance Drouet (2024). Replication Data for: Movie Scripts Corpus [Dataset]. http://doi.org/10.7910/DVN/PZTL2L
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/PZTL2L
Dataset updated
May 6, 2024
Dataset provided by
Harvard Dataverse
Authors
Lance Drouet
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Data Source: https://www.kaggle.com/datasets/gufukuro/movie-scripts-corpus Data Description : Movie Scripts Corpus This corpus was collected to use for screenplay analysis with machine learning methods. Corpus includes movie scripts, crawled from different sources, their annotations by script structural elements and movies metadata. Corpus description Screenplay data consists of: Movie scripts TXT-documents with raw full text (2858 docs) Movie scripts TXT-documents with full text lemmas (2858 docs) Manual annotation TXT-documents for some movie scripts (33 docs, more than 6000 annotated rows) Movie scripts annotations TXT-documents obtained by BERT Movie scripts annotations json-documents obtained by rule-based annotator ScreenPy Movies metadata consists of: Cut versions of movie reviews and scores from metacritic: Number of reviews: 21025 Number of movies with reviews: 2038 Metadata for movies, including: title, akas, launch year, score from metacritic, imdb user rating and number of votes from imdb.com, movie awards, opening weekend, producers, budget, script department, production companies, writers, directors, cast info, countries involved in production, age restrict, plot (with outline), keywords, genres, taglines, critics' synopsis Screenplay awards information: Academy Awards adapted screenplay, Academy Awards original screenplay, BAFTA, Golden Globe Award for Best Screenplay, Writers Guild Awards Winners & Nominees 2020-2013 nominations information for 462 movies in total. Movie characters data consists of: Script text fragments with dialogs and scene descriptions for characters, gathered with annotators: 2153 movies and text fragments for 32114 characters in total Gender labels for 4792 characters
P
CMU Movie Summary Corpus Dataset
paperswithcode.com
opendatalab.com
Updated May 7, 2013
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Bamman; Brendan O{'}Connor; Noah A. Smith (2013). CMU Movie Summary Corpus Dataset [Dataset]. https://paperswithcode.com/dataset/cmu-movie-summary-corpus
Explore at:
Dataset updated
May 7, 2013
Authors
David Bamman; Brendan O{'}Connor; Noah A. Smith
Description
Dataset [46 M] and readme: 42,306 movie plot summaries extracted from Wikipedia + aligned metadata extracted from Freebase, including: Movie box office revenue, genre, release date, runtime, and language Character names and aligned information about the actors who portray them, including gender and estimated age at the time of the movie's release Supplement: Stanford CoreNLP-processed summaries [628 M]. All of the plot summaries from above, run through the Stanford CoreNLP pipeline (tagging, parsing, NER and coref).
Z
Film Circulation dataset
data.niaid.nih.gov
zenodo.org
Updated Jul 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Samoilova, Evgenia (Zhenya) (2024). Film Circulation dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7887671
Explore at:
Dataset updated
Jul 12, 2024
Dataset provided by
Samoilova, Evgenia (Zhenya)
Loist, Skadi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Complete dataset of “Film Circulation on the International Film Festival Network and the Impact on Global Film Culture”

A peer-reviewed data paper for this dataset is in review to be published in NECSUS_European Journal of Media Studies - an open access journal aiming at enhancing data transparency and reusability, and will be available from https://necsus-ejms.org/ and https://mediarep.org

Please cite this when using the dataset.

Detailed description of the dataset:

1 Film Dataset: Festival Programs

The Film Dataset consists a data scheme image file, a codebook and two dataset tables in csv format.

The codebook (csv file “1_codebook_film-dataset_festival-program”) offers a detailed description of all variables within the Film Dataset. Along with the definition of variables it lists explanations for the units of measurement, data sources, coding and information on missing data.

The csv file “1_film-dataset_festival-program_long” comprises a dataset of all films and the festivals, festival sections, and the year of the festival edition that they were sampled from. The dataset is structured in the long format, i.e. the same film can appear in several rows when it appeared in more than one sample festival. However, films are identifiable via their unique ID.

The csv file “1_film-dataset_festival-program_wide” consists of the dataset listing only unique films (n=9,348). The dataset is in the wide format, i.e. each row corresponds to a unique film, identifiable via its unique ID. For easy analysis, and since the overlap is only six percent, in this dataset the variable sample festival (fest) corresponds to the first sample festival where the film appeared. For instance, if a film was first shown at Berlinale (in February) and then at Frameline (in June of the same year), the sample festival will list “Berlinale”. This file includes information on unique and IMDb IDs, the film title, production year, length, categorization in length, production countries, regional attribution, director names, genre attribution, the festival, festival section and festival edition the film was sampled from, and information whether there is festival run information available through the IMDb data.

2 Survey Dataset

The Survey Dataset consists of a data scheme image file, a codebook and two dataset tables in csv format.

The codebook “2_codebook_survey-dataset” includes coding information for both survey datasets. It lists the definition of the variables or survey questions (corresponding to Samoilova/Loist 2019), units of measurement, data source, variable type, range and coding, and information on missing data.

The csv file “2_survey-dataset_long-festivals_shared-consent” consists of a subset (n=161) of the original survey dataset (n=454), where respondents provided festival run data for films (n=206) and gave consent to share their data for research purposes. This dataset consists of the festival data in a long format, so that each row corresponds to the festival appearance of a film.

The csv file “2_survey-dataset_wide-no-festivals_shared-consent” consists of a subset (n=372) of the original dataset (n=454) of survey responses corresponding to sample films. It includes data only for those films for which respondents provided consent to share their data for research purposes. This dataset is shown in wide format of the survey data, i.e. information for each response corresponding to a film is listed in one row. This includes data on film IDs, film title, survey questions regarding completeness and availability of provided information, information on number of festival screenings, screening fees, budgets, marketing costs, market screenings, and distribution. As the file name suggests, no data on festival screenings is included in the wide format dataset.

3 IMDb & Scripts

The IMDb dataset consists of a data scheme image file, one codebook and eight datasets, all in csv format. It also includes the R scripts that we used for scraping and matching.

The codebook “3_codebook_imdb-dataset” includes information for all IMDb datasets. This includes ID information and their data source, coding and value ranges, and information on missing data.

The csv file “3_imdb-dataset_aka-titles_long” contains film title data in different languages scraped from IMDb in a long format, i.e. each row corresponds to a title in a given language.

The csv file “3_imdb-dataset_awards_long” contains film award data in a long format, i.e. each row corresponds to an award of a given film.

The csv file “3_imdb-dataset_companies_long” contains data on production and distribution companies of films. The dataset is in a long format, so that each row corresponds to a particular company of a particular film.

The csv file “3_imdb-dataset_crew_long” contains data on names and roles of crew members in a long format, i.e. each row corresponds to each crew member. The file also contains binary gender assigned to directors based on their first names using the GenderizeR application.

The csv file “3_imdb-dataset_festival-runs_long” contains festival run data scraped from IMDb in a long format, i.e. each row corresponds to the festival appearance of a given film. The dataset does not include each film screening, but the first screening of a film at a festival within a given year. The data includes festival runs up to 2019.

The csv file “3_imdb-dataset_general-info_wide” contains general information about films such as genre as defined by IMDb, languages in which a film was shown, ratings, and budget. The dataset is in wide format, so that each row corresponds to a unique film.

The csv file “3_imdb-dataset_release-info_long” contains data about non-festival release (e.g., theatrical, digital, tv, dvd/blueray). The dataset is in a long format, so that each row corresponds to a particular release of a particular film.

The csv file “3_imdb-dataset_websites_long” contains data on available websites (official websites, miscellaneous, photos, video clips). The dataset is in a long format, so that each row corresponds to a website of a particular film.

The dataset includes 8 text files containing the script for webscraping. They were written using the R-3.6.3 version for Windows.

The R script “r_1_unite_data” demonstrates the structure of the dataset, that we use in the following steps to identify, scrape, and match the film data.

The R script “r_2_scrape_matches” reads in the dataset with the film characteristics described in the “r_1_unite_data” and uses various R packages to create a search URL for each film from the core dataset on the IMDb website. The script attempts to match each film from the core dataset to IMDb records by first conducting an advanced search based on the movie title and year, and then potentially using an alternative title and a basic search if no matches are found in the advanced search. The script scrapes the title, release year, directors, running time, genre, and IMDb film URL from the first page of the suggested records from the IMDb website. The script then defines a loop that matches (including matching scores) each film in the core dataset with suggested films on the IMDb search page. Matching was done using data on directors, production year (+/- one year), and title, a fuzzy matching approach with two methods: “cosine” and “osa.” where the cosine similarity is used to match titles with a high degree of similarity, and the OSA algorithm is used to match titles that may have typos or minor variations.

The script “r_3_matching” creates a dataset with the matches for a manual check. Each pair of films (original film from the core dataset and the suggested match from the IMDb website was categorized in the following five categories: a) 100% match: perfect match on title, year, and director; b) likely good match; c) maybe match; d) unlikely match; and e) no match). The script also checks for possible doubles in the dataset and identifies them for a manual check.

The script “r_4_scraping_functions” creates a function for scraping the data from the identified matches (based on the scripts described above and manually checked). These functions are used for scraping the data in the next script.

The script “r_5a_extracting_info_sample” uses the function defined in the “r_4_scraping_functions”, in order to scrape the IMDb data for the identified matches. This script does that for the first 100 films, to check, if everything works. Scraping for the entire dataset took a few hours. Therefore, a test with a subsample of 100 films is advisable.

The script “r_5b_extracting_info_all” extracts the data for the entire dataset of the identified matches.

The script “r_5c_extracting_info_skipped” checks the films with missing data (where data was not scraped) and tried to extract data one more time to make sure that the errors were not caused by disruptions in the internet connection or other technical issues.

The script “r_check_logs” is used for troubleshooting and tracking the progress of all of the R scripts used. It gives information on the amount of missing values and errors.

4 Festival Library Dataset

The Festival Library Dataset consists of a data scheme image file, one codebook and one dataset, all in csv format.

The codebook (csv file “4_codebook_festival-library_dataset”) offers a detailed description of all variables within the Library Dataset. It lists the definition of variables, such as location and festival name, and festival categories, units of measurement, data sources and coding and missing data.

The csv file “4_festival-library_dataset_imdb-and-survey” contains data on all unique festivals collected from both IMDb and survey sources. This dataset appears in wide format, all information for each festival is listed in one row. This
movie lens 1 million
kaggle.com
zip
Updated Jul 20, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rushikesh Wayal (2020). movie lens 1 million [Dataset]. https://www.kaggle.com/datasets/luffyluffyluffy/movie-lens-1-million
Explore at:
zip(6111648 bytes)Available download formats
Dataset updated
Jul 20, 2020
Authors
Rushikesh Wayal
Description
Dataset

This dataset was created by Rushikesh Wayal

Contents
f
MOVIES DATABASE: COLLECTING SCENES FOR LOCATIONS AND MATERIALITY CHAPTERS
uvaauas.figshare.com
figshare.com
zip
Updated Jun 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
J. Martin Alonso (2024). MOVIES DATABASE: COLLECTING SCENES FOR LOCATIONS AND MATERIALITY CHAPTERS [Dataset]. http://doi.org/10.21942/uva.25930837.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.21942/uva.25930837.v1
Dataset updated
Jun 21, 2024
Dataset provided by
University of Amsterdam / Amsterdam University of Applied Sciences
Authors
J. Martin Alonso
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Excel database encompassing scenes from each of the movies included in the research for the thesis regarding locations and objects depicted in them.
T
imdb_reviews
tensorflow.org
Updated Sep 20, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). imdb_reviews [Dataset]. https://www.tensorflow.org/datasets/catalog/imdb_reviews
Explore at:
Dataset updated
Sep 20, 2024
Description
Large Movie Review Dataset. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well.

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('imdb_reviews', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.
g
MovieLens 1M
grouplens.org
kaggle.com
Updated Mar 19, 2016
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2016). MovieLens 1M [Dataset]. https://grouplens.org/datasets/movielens/1m/
Explore at:
Dataset updated
Mar 19, 2016
Description
Stable benchmark dataset. 1 million ratings from 6000 users on 4000 movies. Released 2/2003.
h
movie-posters
huggingface.co
Updated Apr 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pinecone (2023). movie-posters [Dataset]. https://huggingface.co/datasets/pinecone/movie-posters
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 20, 2023
Dataset authored and provided by
Pinecone
Description
pinecone/movie-posters dataset hosted on Hugging Face and contributed by the HF Datasets community
m
Bollywood Movies data
data.mendeley.com
Updated May 12, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bollywood Movies data [Dataset]. https://data.mendeley.com/datasets/3c57btcxy9/1
Explore at:
Unique identifier
https://doi.org/10.17632/3c57btcxy9.1
Dataset updated
May 12, 2020
Authors
Prashant Premkumar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Using a Python script to scrape data from the web, we collected data pertaining to all 1698 Hindi language movies that released in India across a 13 year period (2005-2017) from the website of Box Office India.
P
MovieLens Dataset
paperswithcode.com
Updated Feb 7, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
F. Maxwell Harper; Joseph A. Konstan (2021). MovieLens Dataset [Dataset]. https://paperswithcode.com/dataset/movielens
Explore at:
Dataset updated
Feb 7, 2021
Authors
F. Maxwell Harper; Joseph A. Konstan
Description
The MovieLens datasets, first released in 1998, describe people’s expressed preferences for movies. These preferences take the form of tuples, each the result of a person expressing a preference (a 0-5 star rating) for a movie at a particular time. These preferences were entered by way of the MovieLens web site1 — a recommender system that asks its users to give movie ratings in order to receive personalized movie recommendations.
P
Data from: MDD Dataset
paperswithcode.com
Updated Jan 27, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jesse Dodge; Andreea Gane; Xiang Zhang; Antoine Bordes; Sumit Chopra; Alexander Miller; Arthur Szlam; Jason Weston (2021). MDD Dataset [Dataset]. https://paperswithcode.com/dataset/mdd
Explore at:
Dataset updated
Jan 27, 2021
Authors
Jesse Dodge; Andreea Gane; Xiang Zhang; Antoine Bordes; Sumit Chopra; Alexander Miller; Arthur Szlam; Jason Weston
Description
Movie Dialog dataset (MDD) is designed to measure how well models can perform at goal and non-goal orientated dialog centered around the topic of movies (question answering, recommendation and discussion).
T
movielens
tensorflow.org
opendatalab.com
+1more
Updated Jul 8, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2020). movielens [Dataset]. https://www.tensorflow.org/datasets/catalog/movielens
Explore at:
Dataset updated
Jul 8, 2020
Description
This dataset contains a set of movie ratings from the MovieLens website, a movie recommendation service. This dataset was collected and maintained by GroupLens, a research group at the University of Minnesota. There are 5 versions included: "25m", "latest-small", "100k", "1m", "20m". In all datasets, the movies data and ratings data are joined on "movieId". The 25m dataset, latest-small dataset, and 20m dataset contain only movie data and rating data. The 1m dataset and 100k dataset contain demographic data in addition to movie and rating data.

"25m": This is the latest stable version of the MovieLens dataset. It is recommended for research purposes.

"latest-small": This is a small subset of the latest version of the MovieLens dataset. It is changed and updated over time by GroupLens.

"100k": This is the oldest version of the MovieLens datasets. It is a small dataset with demographic data.

"1m": This is the largest MovieLens dataset that contains demographic data.

"20m": This is one of the most used MovieLens datasets in academic papers along with the 1m dataset.

For each version, users can view either only the movies data by adding the "-movies" suffix (e.g. "25m-movies") or the ratings data joined with the movies data (and users data in the 1m and 100k datasets) by adding the "-ratings" suffix (e.g. "25m-ratings").

The features below are included in all versions with the "-ratings" suffix.

"movie_id": a unique identifier of the rated movie

"movie_title": the title of the rated movie with the release year in parentheses

"movie_genres": a sequence of genres to which the rated movie belongs

"user_id": a unique identifier of the user who made the rating

"user_rating": the score of the rating on a five-star scale

"timestamp": the timestamp of the ratings, represented in seconds since midnight Coordinated Universal Time (UTC) of January 1, 1970

The "100k-ratings" and "1m-ratings" versions in addition include the following demographic features.

"user_gender": gender of the user who made the rating; a true value corresponds to male

"bucketized_user_age": bucketized age values of the user who made the rating, the values and the corresponding ranges are:

1: "Under 18"

18: "18-24"

25: "25-34"

35: "35-44"

45: "45-49"

50: "50-55"

56: "56+"

"user_occupation_label": the occupation of the user who made the rating represented by an integer-encoded label; labels are preprocessed to be consistent across different versions

"user_occupation_text": the occupation of the user who made the rating in the original string; different versions can have different set of raw text labels

"user_zip_code": the zip code of the user who made the rating

In addition, the "100k-ratings" dataset would also have a feature "raw_user_age" which is the exact ages of the users who made the rating

Datasets with the "-movies" suffix contain only "movie_id", "movie_title", and "movie_genres" features.

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('movielens', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.
d
Film Permits
catalog.data.gov
data.cityofnewyork.us
+4more
Updated Mar 22, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.cityofnewyork.us (2025). Film Permits [Dataset]. https://catalog.data.gov/dataset/film-permits
Explore at:
Dataset updated
Mar 22, 2025
Dataset provided by
data.cityofnewyork.us
Description
Permits are generally required when asserting the exclusive use of city property, like a sidewalk, a street, or a park. See http://www1.nyc.gov/site/mome/permits/when-permit-required.page
Movie content release strategies worldwide 2024
statista.com
Updated Nov 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Movie content release strategies worldwide 2024 [Dataset]. https://www.statista.com/statistics/1464311/movie-content-release-strategies/
Explore at:
Dataset updated
Nov 9, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Dec 2023 - Jan 2024
Area covered
Worldwide
Description
According to a survey done between December 2023 and January 2024, 83 percent of media insiders agreed with the statement that movies will be available on a premium VOD shortly after theaters. Similarly, 76 percent of respondents stated that windows between theaters and first pay premiere will be shorter. In contrast, over half of respondents disagreed with the thought that studios will increasingly release movies on direct-to-consumer SVOD services simultaneously with theaters.

Facebook

Twitter

Click to copy link

Link copied

Cite

Saeed Ghorbani; Kimia Mahdaviani; Anne Thaler; Konrad Kording; Douglas James Cook; Gunnar Blohm; Nikolaus F. Troje (2021). MoVi: A Large Multipurpose Motion and Video Dataset [Dataset]. http://doi.org/10.5683/SP2/JRHDRN

MoVi: A Large Multipurpose Motion and Video Dataset

Explore at:

48 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Unique identifier

https://doi.org/10.5683/SP2/JRHDRN

Dataset updated

Jun 29, 2021

Dataset provided by

Borealis

Authors

Saeed Ghorbani; Kimia Mahdaviani; Anne Thaler; Konrad Kording; Douglas James Cook; Gunnar Blohm; Nikolaus F. Troje

License

https://borealisdata.ca/api/datasets/:persistentId/versions/5.0/customlicense?persistentId=doi:10.5683/SP2/JRHDRNhttps://borealisdata.ca/api/datasets/:persistentId/versions/5.0/customlicense?persistentId=doi:10.5683/SP2/JRHDRN

Description

MoVi is the first human motion dataset to contain synchronized pose, pose-dependent shape and video recordings. The MoVi database can be applied in human pose estimation and tracking, human motion prediction and synthesis, action recognition and gait analysis.

Clear search

Close search

Google apps

Main menu

MoVi: A Large Multipurpose Motion and Video Dataset

MovieQA Dataset

movies-dataset

IMDB 5000 Movie Dataset

Dataset

Contents

Data from: Video Recommendations Based on Visual Features Extracted with...

Moviegalaxies – Social Networks in Movies

Replication Data for: Movie Scripts Corpus

CMU Movie Summary Corpus Dataset

Film Circulation dataset

movie lens 1 million

Dataset

Contents

MOVIES DATABASE: COLLECTING SCENES FOR LOCATIONS AND MATERIALITY CHAPTERS

imdb_reviews

MovieLens 1M

movie-posters

Bollywood Movies data

MovieLens Dataset

Data from: MDD Dataset

movielens

Film Permits

Movie content release strategies worldwide 2024

MoVi: A Large Multipurpose Motion and Video Dataset