6 datasets found

Netflix Prize data
kaggle.com
zip
Updated Jul 19, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Netflix (2017). Netflix Prize data [Dataset]. https://www.kaggle.com/netflix-inc/netflix-prize-data
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Jul 19, 2017
Dataset authored and provided by
Netflixhttp://netflix.com/
Description
Context

Netflix held the Netflix Prize open competition for the best algorithm to predict user ratings for films. The grand prize was $1,000,000 and was won by BellKor's Pragmatic Chaos team. This is the dataset that was used in that competition.

Content

This comes directly from the README:

TRAINING DATASET FILE DESCRIPTION

The file "training_set.tar" is a tar of a directory containing 17770 files, one per movie. The first line of each file contains the movie id followed by a colon. Each subsequent line in the file corresponds to a rating from a customer and its date in the following format:

CustomerID,Rating,Date

MovieIDs range from 1 to 17770 sequentially.

CustomerIDs range from 1 to 2649429, with gaps. There are 480189 users.

Ratings are on a five star (integral) scale from 1 to 5.

Dates have the format YYYY-MM-DD.

MOVIES FILE DESCRIPTION

Movie information in "movie_titles.txt" is in the following format:

MovieID,YearOfRelease,Title

MovieID do not correspond to actual Netflix movie ids or IMDB movie ids.

YearOfRelease can range from 1890 to 2005 and may correspond to the release of corresponding DVD, not necessarily its theaterical release.

Title is the Netflix movie title and may not correspond to titles used on other sites. Titles are in English.

QUALIFYING AND PREDICTION DATASET FILE DESCRIPTION

The qualifying dataset for the Netflix Prize is contained in the text file "qualifying.txt". It consists of lines indicating a movie id, followed by a colon, and then customer ids and rating dates, one per line for that movie id. The movie and customer ids are contained in the training set. Of course the ratings are withheld. There are no empty lines in the file.

MovieID1:

CustomerID11,Date11

CustomerID12,Date12

...

MovieID2:

CustomerID21,Date21

CustomerID22,Date22

For the Netflix Prize, your program must predict the all ratings the customers gave the movies in the qualifying dataset based on the information in the training dataset.

The format of your submitted prediction file follows the movie and customer id, date order of the qualifying dataset. However, your predicted rating takes the place of the corresponding customer id (and date), one per line.

For example, if the qualifying dataset looked like:

111:

3245,2005-12-19

5666,2005-12-23

6789,2005-03-14

225:

1234,2005-05-26

3456,2005-11-07

then a prediction file should look something like:

111:

3.0

3.4

4.0

225:

1.0

2.0

which predicts that customer 3245 would have rated movie 111 3.0 stars on the 19th of Decemeber, 2005, that customer 5666 would have rated it slightly higher at 3.4 stars on the 23rd of Decemeber, 2005, etc.

You must make predictions for all customers for all movies in the qualifying dataset.

THE PROBE DATASET FILE DESCRIPTION

To allow you to test your system before you submit a prediction set based on the qualifying dataset, we have provided a probe dataset in the file "probe.txt". This text file contains lines indicating a movie id, followed by a colon, and then customer ids, one per line for that movie id.

MovieID1:

CustomerID11

CustomerID12

...

MovieID2:

CustomerID21

CustomerID22

Like the qualifying dataset, the movie and customer id pairs are contained in the training set. However, unlike the qualifying dataset, the ratings (and dates) for each pair are contained in the training dataset.

If you wish, you may calculate the RMSE of your predictions against those ratings and compare your RMSE against the Cinematch RMSE on the same data. See http://www.netflixprize.com/faq#probe for that value.

Acknowledgements

The training data came in 17,000+ files. In the interest of keeping files together and file sizes as low as possible, I combined them into four text files: combined_data_(1,2,3,4).txt

The contest was originally hosted at http://netflixprize.com/index.html

The dataset was downloaded from https://archive.org/download/nf_prize_dataset.tar

Inspiration

This is a fun dataset to work with. You can read about the winning algorithm by BellKor's Pragmatic Chaos here
World Bank: Education Data
kaggle.com
zip
Updated Mar 20, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
World Bank (2019). World Bank: Education Data [Dataset]. https://www.kaggle.com/datasets/theworldbank/world-bank-intl-education
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Mar 20, 2019
Dataset authored and provided by
World Bankhttp://worldbank.org/
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

The World Bank is an international financial institution that provides loans to countries of the world for capital projects. The World Bank's stated goal is the reduction of poverty. Source: https://en.wikipedia.org/wiki/World_Bank

Content

This dataset combines key education statistics from a variety of sources to provide a look at global literacy, spending, and access.

For more information, see the World Bank website.

Fork this kernel to get started with this dataset.

Acknowledgements

https://bigquery.cloud.google.com/dataset/bigquery-public-data:world_bank_health_population

http://data.worldbank.org/data-catalog/ed-stats

https://cloud.google.com/bigquery/public-data/world-bank-education

Citation: The World Bank: Education Statistics

Dataset Source: World Bank. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Banner Photo by @till_indeman from Unplash.

Inspiration

Of total government spending, what percentage is spent on education?
Starbucks Locations Worldwide
kaggle.com
zip
Updated Feb 13, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Starbucks (2017). Starbucks Locations Worldwide [Dataset]. https://www.kaggle.com/starbucks/store-locations
Explore at:
zip(1149144 bytes)Available download formats
Dataset updated
Feb 13, 2017
Dataset authored and provided by
Starbuckshttp://starbucks.com/
Description
Context

Starbucks started as a roaster and retailer of whole bean and ground coffee, tea and spices with a single store in Seattle’s Pike Place Market in 1971. The company now operates more than 24,000 retail stores in 70 countries.

Content

This dataset includes a record for every Starbucks or subsidiary store location currently in operation as of February 2017.

Acknowledgements

This data was scraped from the Starbucks store locator webpage by Github user chrismeller.

Inspiration

What city or country has the highest number of Starbucks stores per capita? What two Starbucks locations are the closest in proximity to one another? What location on Earth is farthest from a Starbucks? How has Starbucks expanded overseas?
NY Emergency Response Incidents
kaggle.com
Updated Dec 2, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of New York (2019). NY Emergency Response Incidents [Dataset]. https://www.kaggle.com/new-york-city/ny-emergency-response-incidents/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 2, 2019
Dataset provided by
Kagglehttp://kaggle.com/
Authors
City of New York
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
New York, New York
Description
Content

Type and address of emergency incident to which OEM responded

Context

This is a dataset hosted by the City of New York. The city has an open data platform found here and they update their information according the amount of data that is brought in. Explore New York City using Kaggle and all of the data sources available through the City of New York organization page!

Update Frequency: This dataset is updated monthly.

Acknowledgements

This dataset is maintained using Socrata's API and Kaggle's API. Socrata has assisted countless organizations with hosting their open data and has been an integral part of the process of bringing more data to the public.
Binance Crypto Klines
kaggle.com
zip
Updated Apr 8, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Binance (2018). Binance Crypto Klines [Dataset]. https://www.kaggle.com/binance/binance-crypto-klines
Explore at:
zip(1033121370 bytes)Available download formats
Dataset updated
Apr 8, 2018
Dataset authored and provided by
Binancehttp://binance.com/
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

Each file contains klines for 1 month period with 1 minute intervals. File name formating looks like mm-yyyy-SMB1SMB2 (e.g. 11-2017-XRPBTC).

This data set contains now only XRP/BTC and ETH/USDT symbol pair now, but it will be expand soon.

Features

Open time -> timestamp (milliseconds)

Open price -> float

High price -> float

Low price -> float

Close price -> float

Volume -> float

Quote asset volume -> float

Close time -> timestamp (milliseconds)

Number of trades -> int

Taker buy base asset volume -> float

Taker buy quote asset volume -> float

Acknowledgements

This dataset was collected from Binance Exchange | Worlds Largest Crypto Exchange

Inspiration

This data set could inspire you on most efficient trading algorithms.
IBRD Statement Of Income FY2013
kaggle.com
zip
Updated Apr 9, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
World Bank (2019). IBRD Statement Of Income FY2013 [Dataset]. https://www.kaggle.com/theworldbank/ibrd-statement-of-income-fy2013
Explore at:
zip(3239 bytes)Available download formats
Dataset updated
Apr 9, 2019
Dataset authored and provided by
World Bankhttp://worldbank.org/
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Content

Provides data from the IBRD Statement of Income for the fiscal years ended June 30, 2013, June 30, 2012 and June 30, 2011. The values are expressed in millions of U.S. Dollars. Where applicable, changes have been made to certain line items on FY 2012 income statement to conform with the current year's presentation, but the comparable prior years' data sets have not been adjusted to reflect the reclassification impact of those changes.

Context

This is a dataset hosted by the World Bank. The organization has an open data platform found here and they update their information according the amount of data that is brought in. Explore World Bank's Financial Data using Kaggle and all of the data sources available through the World Bank organization page!

Update Frequency: This dataset is updated daily.

Acknowledgements

This dataset is maintained using Socrata's API and Kaggle's API. Socrata has assisted countless organizations with hosting their open data and has been an integral part of the process of bringing more data to the public.

This dataset is distributed under a Creative Commons Attribution 3.0 IGO license.

Cover photo by Matt Artz on Unsplash
Unsplash Images are distributed under a unique Unsplash License.

This dataset is distributed under Creative Commons Attribution 3.0 IGO
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Netflix (2017). Netflix Prize data [Dataset]. https://www.kaggle.com/netflix-inc/netflix-prize-data

Netflix Prize data

Dataset from Netflix's competition to improve their reccommendation algorithm

Explore at:

zip(0 bytes)Available download formats

Dataset updated

Jul 19, 2017

Dataset authored and provided by

Netflixhttp://netflix.com/

Description

Context

Netflix held the Netflix Prize open competition for the best algorithm to predict user ratings for films. The grand prize was $1,000,000 and was won by BellKor's Pragmatic Chaos team. This is the dataset that was used in that competition.

Content

This comes directly from the README:

TRAINING DATASET FILE DESCRIPTION

The file "training_set.tar" is a tar of a directory containing 17770 files, one per movie. The first line of each file contains the movie id followed by a colon. Each subsequent line in the file corresponds to a rating from a customer and its date in the following format:

CustomerID,Rating,Date

MovieIDs range from 1 to 17770 sequentially.
CustomerIDs range from 1 to 2649429, with gaps. There are 480189 users.
Ratings are on a five star (integral) scale from 1 to 5.
Dates have the format YYYY-MM-DD.

MOVIES FILE DESCRIPTION

Movie information in "movie_titles.txt" is in the following format:

MovieID,YearOfRelease,Title

MovieID do not correspond to actual Netflix movie ids or IMDB movie ids.
YearOfRelease can range from 1890 to 2005 and may correspond to the release of corresponding DVD, not necessarily its theaterical release.
Title is the Netflix movie title and may not correspond to titles used on other sites. Titles are in English.

QUALIFYING AND PREDICTION DATASET FILE DESCRIPTION

The qualifying dataset for the Netflix Prize is contained in the text file "qualifying.txt". It consists of lines indicating a movie id, followed by a colon, and then customer ids and rating dates, one per line for that movie id. The movie and customer ids are contained in the training set. Of course the ratings are withheld. There are no empty lines in the file.

MovieID1:

CustomerID11,Date11

CustomerID12,Date12

...

MovieID2:

CustomerID21,Date21

CustomerID22,Date22

For the Netflix Prize, your program must predict the all ratings the customers gave the movies in the qualifying dataset based on the information in the training dataset.

The format of your submitted prediction file follows the movie and customer id, date order of the qualifying dataset. However, your predicted rating takes the place of the corresponding customer id (and date), one per line.

For example, if the qualifying dataset looked like:

111:

3245,2005-12-19

5666,2005-12-23

6789,2005-03-14

225:

1234,2005-05-26

3456,2005-11-07

then a prediction file should look something like:

111:

3.0

3.4

4.0

225:

1.0

2.0

which predicts that customer 3245 would have rated movie 111 3.0 stars on the 19th of Decemeber, 2005, that customer 5666 would have rated it slightly higher at 3.4 stars on the 23rd of Decemeber, 2005, etc.

You must make predictions for all customers for all movies in the qualifying dataset.

THE PROBE DATASET FILE DESCRIPTION

To allow you to test your system before you submit a prediction set based on the qualifying dataset, we have provided a probe dataset in the file "probe.txt". This text file contains lines indicating a movie id, followed by a colon, and then customer ids, one per line for that movie id.

MovieID1:

CustomerID11

CustomerID12

...

MovieID2:

CustomerID21

CustomerID22

Like the qualifying dataset, the movie and customer id pairs are contained in the training set. However, unlike the qualifying dataset, the ratings (and dates) for each pair are contained in the training dataset.

If you wish, you may calculate the RMSE of your predictions against those ratings and compare your RMSE against the Cinematch RMSE on the same data. See http://www.netflixprize.com/faq#probe for that value.

Acknowledgements

The training data came in 17,000+ files. In the interest of keeping files together and file sizes as low as possible, I combined them into four text files: combined_data_(1,2,3,4).txt

The contest was originally hosted at http://netflixprize.com/index.html

The dataset was downloaded from https://archive.org/download/nf_prize_dataset.tar

Inspiration

This is a fun dataset to work with. You can read about the winning algorithm by BellKor's Pragmatic Chaos here

Clear search

Close search

Google apps

Main menu

Netflix Prize data

Context

Content

TRAINING DATASET FILE DESCRIPTION

MOVIES FILE DESCRIPTION

QUALIFYING AND PREDICTION DATASET FILE DESCRIPTION

THE PROBE DATASET FILE DESCRIPTION

Acknowledgements

Inspiration

World Bank: Education Data

Context

Content

Acknowledgements

Inspiration

Starbucks Locations Worldwide

Context

Content

Acknowledgements

Inspiration

NY Emergency Response Incidents

Content

Context

Acknowledgements

Binance Crypto Klines

Context

Features

Acknowledgements

Inspiration

IBRD Statement Of Income FY2013

Content

Context

Acknowledgements

Netflix Prize data

Dataset from Netflix's competition to improve their reccommendation algorithm

Context

Content

TRAINING DATASET FILE DESCRIPTION

MOVIES FILE DESCRIPTION

QUALIFYING AND PREDICTION DATASET FILE DESCRIPTION

THE PROBE DATASET FILE DESCRIPTION

Acknowledgements

Inspiration