Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Prime Video launched all the way back in 2006, originally called Amazon Unbox. At launch, it offered a way to store TV series and movies purchased from Amazon, with an instant video subscription...
Facebook
TwitterAbout this Dataset: Amazon Prime is another one of the most popular media and video streaming platforms. They have close to 10000 movies or tv shows available on their platform, as of mid-2021, they have over 200M Subscribers globally. This tabular dataset consists of listings of all the movies and tv shows available on Amazon Prime, along with details such as - cast, directors, ratings, release year, duration, etc
Facebook
TwitterComprehensive dataset covering Amazon Prime availability across 27 countries, including launch dates, pricing, and regional benefit differences
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
About this Dataset: Amazon Prime is another one of the most popular media and video streaming platforms. They have close to 10000 movies or tv shows available on their platform, as of mid-2021, they have over 200M Subscribers globally. This tabular dataset consists of listings of all the movies and tv shows available on Amazon Prime, along with details such as - cast, directors, ratings, release year, duration, etc.*
- Understanding what content is available in different countries
- Identifying similar content by matching text-based features
- Network analysis of Actors / Directors and find interesting insights
- Does Amazon Prime has more focus on TV Shows than movies in recent years.
![alt text][1] ![alt text][3] ![alt text][5] ![alt text][7] [1]: https://i.imgur.com/As0PMcL.jpg =75x20
[3]: https://i.imgur.com/r5t3MpQ.jpg =75x20
[5]: https://i.imgur.com/4a4ZMuy.png =75x30
[7]: https://i.imgur.com/nCL8Skc.png?1 =75x32
Facebook
TwitterIn 2025, Amazon's net revenue from the subscription services segment amounted to 49.6 billion U.S. dollars. Subscription services include Amazon Prime, for which Amazon reported 230 million paying members worldwide at the end of 2023. The AWS category generated 128.7 billion U.S. dollars in annual sales. During the most recently reported fiscal year, the company’s net revenue amounted to 717 billion U.S. dollars. Amazon revenue segments Amazon is one of the biggest online companies worldwide. In 2019, the company’s revenue increased by 21 percent, compared to Google’s revenue growth during the same fiscal period, which was just 18 percent. The majority of Amazon’s net sales are generated through its North American business segment, which accounted for 236.3 billion U.S. dollars in 2020. The United States are the company’s leading market, followed by Germany and the United Kingdom. Business segment: Amazon Web Services Amazon Web Services, commonly referred to as AWS, is one of the strongest-growing business segments of Amazon. AWS is a cloud computing service that provides individuals, companies and governments with a wide range of computing, networking, storage, database, analytics and application services, among many others. As of the third quarter of 2020, AWS accounted for approximately 32 percent of the global cloud infrastructure services vendor market.
Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Amazon is one of the most recognisable brands in the world, and the third largest by revenue. It was the fourth tech company to reach a $1 trillion market cap, and a market leader in e-commerce,...
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset appears to contain information about products sold on Amazon, including various attributes such as prices, ratings, availability, and sales volume. Here is a description of each column and potential analyses, modeling, and data science techniques you can use:
Column Descriptions asin: Amazon Standard Identification Number, a unique identifier for each product. product_title: The name or title of the product. product_price: The current price of the product. product_original_price: The original price of the product before any discounts. currency: The currency in which the product price is listed. product_star_rating: The average star rating of the product. product_num_ratings: The number of ratings the product has received. product_url: The URL to the product’s Amazon page. product_photo: A link to the product’s photo. product_num_offers: The number of different offers available for the product. product_minimum_offer_price: The minimum price of the offers available. is_best_seller: Indicator if the product is a best seller. is_amazon_choice: Indicator if the product is an Amazon's Choice product. is_prime: Indicator if the product is eligible for Amazon Prime. climate_pledge_friendly: Indicator if the product is labeled as Climate Pledge Friendly. sales_volume: The volume of sales for the product. delivery: Information about the delivery options for the product. has_variations: Indicator if the product has variations (e.g., different sizes or colors). product_availability: The availability status of the product. unit_price: The price per unit of measure. unit_count: The number of units included in the product price. >Potential Analyses and Data Science Techniques
Descriptive Statistics: Calculate summary statistics for numeric columns (e.g., average, median, min, max of prices, ratings, sales volume). Frequency counts for categorical columns (e.g., how many products are best sellers, Amazon's Choice, Prime eligible).
Price Analysis: Compare the current price to the original price to assess discount levels. Analyze pricing trends across different categories or brands.
Rating Analysis: Examine the distribution of product ratings. Correlate the number of ratings with the average star rating to identify trends.
Sales Volume Analysis: Identify top-selling products. Analyze sales volume in relation to pricing, rating, and other attributes.
Product Categorization: Group products based on categories such as best seller, Amazon's Choice, Prime eligibility, and Climate Pledge Friendly status. Perform clustering to identify patterns or segments among products.
Predictive Modeling: Price Prediction: Use regression models (e.g., linear regression, decision trees) to predict product prices based on features like ratings, number of offers, and best seller status. Sales Volume Prediction: Use regression or time series analysis to predict future sales volumes. Rating Prediction: Predict product ratings using features such as price, number of ratings, and best seller status.
Recommendation Systems: Build collaborative filtering or content-based recommendation systems to suggest products to customers based on their preferences and past behavior.
Classification Tasks: Classify products into different categories (e.g., best seller, Amazon's Choice) using classification algorithms (e.g., logistic regression, random forests, SVM).
Sentiment Analysis: Analyze customer reviews (if available) to gauge sentiment and correlate it with ratings and sales volume.
Market Basket Analysis: If purchase data is available, perform association rule mining to find frequently co-purchased items.
Visualization Techniques Histograms and Bar Charts: For visualizing the distribution of prices, ratings, and sales volumes.
Box Plots: For comparing prices and ratings across different product categories.
Scatter Plots: To visualize relationships between numeric variables (e.g., price vs. sales volume).
Heatmaps: To show correlations between different features.
Data Cleaning and Preprocessing Handle missing values (e.g., impute, remove). Convert categorical variables into numerical form using techniques like one-hot encoding. Normalize or scale numeric features if required for certain algorithms.
Advanced Techniques Feature Engineering: Create new features from existing data (e.g., discount percentage from original and current prices). Dimensionality Reduction: Use PCA or other techniques if the dataset has high dimensionality. These analyses and techniques can help uncover valuable insights, optimize pricing strategies, improve customer satisfaction, and ultimately drive sales and profitability.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
When I was in college I had developed a project with jupyter notebook, which consumes data from the Netflix Prime Video Movies and TV Shows set. The idea was to use this set of data, clean, analyze and develop a stage where I could recommend movies and TV shows.
I was very happy with the result. But I wanted more, I wanted to take this notebook and transfer it to an application where I could interact with the project. So create a personal project where I can use what I studied and learned over time.
But something was missing, which was how am I going to show this result of my project. During that time I discovered this tool Streamlit, ohhhhhhhhhh!!!!! Incredible !!! The flexibility I gained using it was very good and in addition to being able to deploy using their platform, this way I can show what I did.
I want to thank Kaggle - @shivamb, for making the sets below available. In addition to the Netflix set, there are 3 more.
From these 4 sets, the idea of creating a single one came up to be able to expand the data further, to be able to create more recommendations. Follow the link below.
4 Services Streaming Movies and Tv Shows
If you want to understand the process more, I have a post and 4 more notebooks where I explain the notebook I created.
You can check out the application I developed using Streamlit and using this data.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F13034312%2F11accb95234686272f5e4b69d53909cf%2Fblueflix.png?generation=1700521836472107&alt=media" alt="">
This dataset has two sets:
all_streaming.csv: This set has a unification of the 4 sets above, it received a column to identify each streaming, I also added the groups column where I used the k-means algorithm to unite the best groups.
all_gender.csv: This set has separation of all genres from the streaming sets. Each column being binary. Additionally, I added the groups column where I used the k-means algorithm to join the groups.
You can find the notebook for this step at this link:
Facebook
TwitterSUMMARY:
Vumonic provides its clients email receipt datasets on weekly, monthly, or quarterly subscriptions, for any online consumer vertical. We gain consent-based access to our users' email inboxes through our own proprietary apps, from which we gather and extract all the email receipts and put them into a structured format for consumption of our clients. We currently have over 1M users in our India panel.
If you are not familiar with email receipt data, it provides item and user-level transaction information (all PII-wiped), which allows for deep granular analysis of things like marketshare, growth, competitive intelligence, and more.
VERTICALS:
PRICING/QUOTE:
Our email receipt data is priced market-rate based on the requirement. To give a quote, all we need to know is:
Send us over this info and we can answer any questions you have, provide sample, and more.
Facebook
TwitterThis data set was created so as to analyze the latest shows available on Amazon Prime as well as the shows with a high rating.
The data set contains the name of the show or title, year of the release which is the year in which the show was released or went on-air, No.of seasons means the number of seasons of the show which are available on Prime, Language is for the audio language of the show and does not take into consideration the language of the subtitles, genre of the show like Kids, Drama, Action and so on, IMDB ratings of the show: though for many tv shows and kid shows the rating was not available, Age of Viewers is to specify the age of the target audience- All in age means that the content is not restricted to any particular age group and all audiences can view it.
I have collected this data from Amazon Prime's Website.
Since a lot many TV shows have high IMDB ratings but don't get viewed that much because the audience is not aware of it or it is not advertised much. I have created this data set so as to find out the highest-rated shows in each category or in a particular genre.
Facebook
TwitterThis Dataset is an updated version of the Amazon review dataset released in 2014. As in the previous version, this dataset includes reviews (ratings, text, helpfulness votes), product metadata (descriptions, category information, price, brand, and image features), and links (also viewed/also bought graphs). In addition, this version provides the following features:
More reviews:
New reviews:
Metadata: - We have added transaction metadata for each review shown on the review page.
If you publish articles based on this dataset, please cite the following paper:
Facebook
TwitterThis data set provides high-resolution (~500 m) gridded land and stream drainage direction maps for the Amazon River basin, excluding the Rio Tocantins basin. These maps are the result of a new topography-independent analysis method (Mayorga et al., 2005) using the vector river network from the Digital Chart of the World (DCW, Danko, 1992) to create a high-resolution flow direction map.
The data products include (1) a stream network coverage with stream order assigned to each reach; (2) the basin boundaries of the major tributaries to the Amazon mainstem; (3) the mouths; and (4) the source points of these tributaries.
There are 7 ESRI ArcGIS shapefiles provided in compressed *.zip format and 4 GeoTiff image files with this data set.
Facebook
TwitterDetails of Movies and Shows on Amazon Prime Video in India region. This dataset gives an idea of content available on Subscription compared to that on Rental. Due to targeting India region, the ratings are as prevelent in India (U, U/A, A etc)
Facebook
Twitterhttps://live.ece.utexas.edu/research/LIVE_APV_Study/apv_index.htmlhttps://live.ece.utexas.edu/research/LIVE_APV_Study/apv_index.html
Video live streaming is gaining prevalence among video streaming services, especially for the delivery of popular sporting events. The quality of these live streaming videos can be adversely affected by any of a wide variety of events,including poor network connections, capture artifacts, and distortions incurred during coding and transmission. Because of this, the development of objective Video Quality Assessment (VQA) algorithms that can predict the perceptual quality of videos have become important sources of feedback, monitoring, and control of video streaming. Important resources for developing these algorithms are appropriate databases that exemplify the kinds of live streaming video distortions encountered in practice. Towards making progress in this direction, we built a video quality database specifically designed for live streaming VQA research. The new video database is called the Laboratory for Image and Video Engineering - Amazon Prime Video (APV) Live Video Streaming Database (LIVE-APV). We envision that researchers will find the dataset to be useful for the development, testing, and comparison of future VQA models.
Facebook
TwitterAmazon Prime is a highly renowned platform for media and video streaming, boasting a substantial library of approximately 10,000 movies and TV shows. As of mid-2021, the platform has garnered a remarkable global subscriber base of over 200 million. This tabular dataset encompasses comprehensive listings of all the available movies and TV shows on Amazon Prime, accompanied by pertinent details, including cast members, directors, ratings, release year, duration, and more.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Abstract: The replacement of tropical forests to production systems is one of the leading causes of riverine ecosystem alterations. However, current assemblages’ composition may also result from the time since these transformations have begun. Therefore, the knowledge of diversified historical scenarios can facilitate the accomplishment of actions that involve the aquatic environments recovery. In this study, an inventory of stream fish was carried out in basins whose deforestation was intensified in the last 20 years, to compose a baseline for ecological and taxonomic studies. The habitat, physical and chemical variables, and the fish assemblages from 60 streams in the northwest region of the state of Mato Grosso, in the Aripuanã and Juruena river basins, were sampled with standardized procedures. For a total of 130 species, a numerical predominance of small-sized Characidae and great rarity were registered, with 50 species represented by less than ten individuals and 19 singletons. Approximately 15% of the sampled taxa were identified only at the generic level, and for several taxa, more detailed taxonomic and molecular studies are required in order to achieve satisfactory identifications. None threatened species were so far reported. On the other hand, two specimens of non-native species were sampled. Although habitat quality is higher in forested streams, no differences in the species richness were registered when compared to the pasture with riparian forest streams or to more deforested streams. However, abundance was greater in these last two streams groups as a result of small-sized characins dominance.
Facebook
TwitterAbout this Dataset: Disney+ is another one of the most popular media and video streaming platforms. They have close to 1300 movies or tv shows available on their platform, as of mid-2021, they have over 116M Subscribers globally. This tabular dataset consists of listings of all the movies and tv shows available on Amazon Prime, along with details such as - cast, directors, ratings, release year, duration, etc.
![alt text][1] ![alt text][3] ![alt text][5] ![alt text][7] [1]: https://i.imgur.com/As0PMcL.jpg =75x20
[3]: https://i.imgur.com/r5t3MpQ.jpg =75x20
[5]: https://i.imgur.com/4a4ZMuy.png =75x30
[7]: https://i.imgur.com/nCL8Skc.png?1 =75x32
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
About Dataset
Edit This dataset contains information about a collection of movies across multiple genres, including Comedy, Romance, Thriller, and Drama. Each record in the dataset includes the following attributes:
Title: The name of the movie. IMDb Rating: The IMDb rating of the movie, a measure of its popularity and quality, based on user reviews. Release Year: The year the movie was released. Duration: The length of the movie in minutes. Genre: The genre or category of the movie, such as Comedy, Drama, Thriller, or Romance. The dataset covers movies spanning various genres and time periods, offering insights into movie ratings, durations, and genres. The data could be used for analysis in areas such as movie recommendations, trends in genre popularity, or the correlation between movie length and user ratings.
Here’s a detailed description of each column in your dataset:
Title: Description: This column contains the name of the movie. It serves as a unique identifier for each movie in the dataset. The titles represent a wide range of films from various genres and periods.
IMDb Rating: Description: This column represents the IMDb rating of each movie, which is a score given by users on the IMDb platform. The rating is typically out of 10 and reflects the overall user perception of the movie, including aspects such as storytelling, acting, direction, and entertainment value. Higher ratings generally indicate better reception by audiences.
Release Year: Description: This column indicates the year when the movie was officially released. It provides a temporal context for each movie, helping users understand when the movie was made and the era it belongs to. This can be useful for analyzing trends in the movie industry over time.
Duration: Description: This column contains the duration of each movie, measured in minutes. It indicates how long the movie runs from start to finish. This data is important for understanding the length of films, which can be a factor in viewers' preferences and movie industry trends.
Genre: Description: This column categorizes each movie based on its genre, such as Comedy, Drama, Kids Movies, or Romance. The genre provides insights into the movie's thematic focus and target audience. Genres help classify movies into broad categories, allowing for analysis of trends in different movie types over time.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This data set was created to list all shows available on Amazon Prime streaming, and analyze the data to find interesting facts. This data was acquired in May 2022 containing data available in the United States.
This dataset has two files containing the titles (titles.csv) and the cast (credits.csv) for the title.
This dataset contains +9k unique titles on Amazon Prime with 15 columns containing their information, including:
- id: The title ID on JustWatch.
- title: The name of the title.
- show_type: TV show or movie.
- description: A brief description.
- release_year: The release year.
- age_certification: The age certification.
- runtime: The length of the episode (SHOW) or movie.
- genres: A list of genres.
- production_countries: A list of countries that produced the title.
- seasons: Number of seasons if it's a SHOW.
- imdb_id: The title ID on IMDB.
- imdb_score: Score on IMDB.
- imdb_votes: Votes on IMDB.
- tmdb_popularity: Popularity on TMDB.
- tmdb_score: Score on TMDB.
And over +124k credits of actors and directors on Amazon Prime titles with 5 columns containing their information:
- person_ID: The person ID on JustWatch.
- id: The title ID on JustWatch.
- name: The actor or director's name.
- character_name: The character name.
- role: ACTOR or DIRECTOR.
- Developing a content-based recommender system using the genres and/or descriptions.
- Identifying the main content available on the streaming.
- Network analysis on the cast of the titles.
- Exploratory data analysis to find interesting insights.
If you want to see how I obtained these data, please check my GitHub repository.
All data were collected from JustWatch.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
📖 Dataset Description
This dataset contains 200 Amazon Prime movies along with their detailed metadata. Each entry provides essential information such as IMDb ratings, genres, release year, and movie synopsis. The dataset is designed for researchers, data analysts, and machine learning enthusiasts who want to explore insights about movies on streaming platforms.
🔑 Features
title – Movie title
imdbrating – IMDb rating (float)
released – Release year (integer)
genre – List of genres (e.g., Action, Comedy, Drama)
imdbid – Unique IMDb movie identifier
synopsis – Short plot summary (may have some missing values)
imageurl – Movie poster link
type – Type of content (mostly movies)
🎯 Use Cases
Exploratory Data Analysis (EDA) on movie trends
Genre-based rating comparison
NLP tasks on movie synopsis
Recommendation systems
Data visualization projects
⚠️ Notes
Some synopsis values are missing.
genre and imageurl were originally stored as list-like strings and can be cleaned for better use.
provided dataset in machine readable format
Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Prime Video launched all the way back in 2006, originally called Amazon Unbox. At launch, it offered a way to store TV series and movies purchased from Amazon, with an instant video subscription...