Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The dataset has been collected in the frame of the Prac1 of the subject Tipology and Data Life Cycle of the Master's Degree in Data Science of the Universitat Oberta de Catalunya (UOC).
The dataset contains 25 variables and 52478 records corresponding to books on the GoodReads Best Books Ever list (the larges list on the site).
Original code used to retrieve the dataset can be found on github repository: github.com/scostap/goodreads_bbe_dataset
The data was retrieved in two sets, the first 30000 books and then the remainig 22478. Dates were not parsed and reformated on the second chunk so publishDate and firstPublishDate are representet in a mm/dd/yyyy format for the first 30000 records and Month Day Year for the rest.
Book cover images can be optionally downloaded from the url in the 'coverImg' field. Python code for doing so and an example can be found on the github repo.
The 25 fields of the dataset are:
| Attributes | Definition | Completeness |
| ------------- | ------------- | ------------- |
| bookId | Book Identifier as in goodreads.com | 100 |
| title | Book title | 100 |
| series | Series Name | 45 |
| author | Book's Author | 100 |
| rating | Global goodreads rating | 100 |
| description | Book's description | 97 |
| language | Book's language | 93 |
| isbn | Book's ISBN | 92 |
| genres | Book's genres | 91 |
| characters | Main characters | 26 |
| bookFormat | Type of binding | 97 |
| edition | Type of edition (ex. Anniversary Edition) | 9 |
| pages | Number of pages | 96 |
| publisher | Editorial | 93 |
| publishDate | publication date | 98 |
| firstPublishDate | Publication date of first edition | 59 |
| awards | List of awards | 20 |
| numRatings | Number of total ratings | 100 |
| ratingsByStars | Number of ratings by stars | 97 |
| likedPercent | Derived field, percent of ratings over 2 starts (as in GoodReads) | 99 |
| setting | Story setting | 22 |
| coverImg | URL to cover image | 99 |
| bbeScore | Score in Best Books Ever list | 100 |
| bbeVotes | Number of votes in Best Books Ever list | 100 |
| price | Book's price (extracted from Iberlibro) | 73 |
The best-selling print book in the United States in 2023 ranked by unit sales was "It Ends With Us" by Colleen Hoover, with a total of just over **** million sales. Other successful books which sold well that year included two more by Hoover, including "It Starts With Us" and "Verity". The top 10 best-selling print books in 2023 were written mostly by women, with Rebecca Yarros also enjoying over ****million unit sales of both her books "Fourth Wing" and "Iron Flames". Prince Harry's book "Spare" ranked third with **** million units sold. Print book sales remain stable In 2023, adult nonfiction sales of print books in the U.S. amounted to over *** million. Whilst this was the lowest recorded since 2017, the category remained the bestselling book category. Adult fiction sales on the other hand rose to over *** million units, continuing the upward trend from previous years. Despite the growing prevalence of digital books, print remains a popular option. Print still accounts for the majority of U.S. book sales revenue According to the most recently available data, the size of the U.S. audiobook market was estimated at *** billion U.S. dollars, and could soon surpass the *** billion dollar mark if annual growth continues at the same rate. Whilst sales of digital books are generally more difficult to track and therefore prone to adjustments, data shows that print still accounts for the majority of U.S. books sales revenue, at roughly ***** times the amount generated by e-books and audiobooks combined. Print book sales in the U.S are overall higher than in the early 2000s, showing not only an ongoing interest in the format but even an uptick in book buying in recent years.
The best-selling book in the United States as of the week ending February 10th, 2024 was "The Women" by Kristin Hannah with ****** thousand units sold. Sarah J. Maas had two titles in the bestseller's list that week, "House of Flame and Shadow" and "A Court of Thorns and Roses". What makes a book a best-seller? Ultimately, there is no secret ingredient to making a book a best-seller, despite numerous online articles offering tips on how to craft a novel or non-fiction piece that will earn millions. However, being an international icon naturally presents an advantage. Michelle Obama’s millions of Instagram followers and her previous position as First Lady of the United States certainly helped thrust her book "Becoming" into the media spotlight. For other writers such as E. L. James (author of the 'Fifty Shades’ series) being part of fan fiction communities and crafting a story based on an existing narrative, notably Stephanie Meyer’s ‘Twilight’, helped to push sales by targeting a particular demographic. Many best-selling books go on to become classics, however not all members of the classical literary canon shot to fame upon publishing a novel. A look at classic literature A survey on readership of selected literary genres showed that classics were one of the most favorable among U.S. adults, and numerous books in this category proved popular among Goodreads users. As of November 2018, almost **** million Goodreads users had marked Harper Lee’s ‘To Kill A Mockingbird’ as ‘to be read’, with other important literary works such as Anne Frank’s ‘The Diary of a Young Girl’ garnering significant interest on the platform, as well as books by George Orwell, F. Scott Fitzgerald and Jane Austen. Anne Frank’s book is an excellent example of one which became a best-seller (and indeed, a classic) unintentionally, however like so many authors Frank sadly never lived to see her diary grow to global success.
This dataset offers an in-depth look into Amazon's top 100 Bestselling books along with their customer reviews, Ratings, Price etc. Whether you're a book enthusiast, data scientist, or just curious about the latest literary trends, this dataset provides a window into the world of popular reading.{Scrapped dataset on November 2023}
Book Rank: The ranking of the book among the top 100 Bestselling books on Amazon. Book Title: The title of the book. Price: The price of the book in USD. Rating: The overall rating of the book, on a scale of 1 to 5. Author: The author of the book. Year of Publication: The year in which the book was published. Genre: The genre or category to which the book belongs. URL: The URL link to the book on Amazon's platform. Review Title: The title of the book review. Reviewer: The name of the person who has written a review for the book. Reviewer Rating: The rating given by the reviewer for the book, on a scale of 1 to 5. Review Description: The text description of the review given. Is_verified: Indicates whether the review is verified as a genuine customer review. Date: The timestamp indicates the date when the review was posted. Timestamp: The timestamp indicates when the review was posted. ASIN: Amazon Standard Identification Number assigned to products on Amazon. Feel free to download the data and use it in your work. I will wait for interesting notebooks from your side. Thank you
Original Data Source: Top 100 Bestselling Book Reviews on Amazon
In March 2023, Prince Harry was the best-selling author in Poland with his personal publication, "Ten drugi". Over *** thousand copies of his book were sold in the observed period. Popularity of books in Poland The number of public libraries in Poland has been on a downward trend from year to year. However, it does not change the fact that many Poles like to spend time on good reading. This is evidenced, among other things, by the number of books published and the earnings of popular authors. In 2022, Polish writer and children's book illustrator Anita Głowińska sold more than *********** copies of her books, having earned nearly ************* zloty. However, the highest-earning writer in Poland was crime/thriller novelist Remigiusz Mróz, who earned about *********** zloty by selling almost *** thousands of his books. Books in hard copy versus electronic format The number of bookstores, like libraries, is also experiencing annual declines in Poland. This may be related to the growing popularity of e-books. As of early 2023, Audioteka, a free app from which people can download or listen to audiobooks, generated nearly *** million U.S. dollars in revenue. As for devices, meanwhile, the most popular e-book reader was Amazon Kindle 10, which could be purchased for *** zloty at the lowest price.
The review corpus used here consists of a collection of Goodreads book reviews obtained from the Kaggle website. Originally, it consists of around 10,000 reviews written towards top 100 science fiction books (ranked based on Goodreads ratings). However, upon examination, we find that a significant number of reviews comprise various types of issues, including missing values for the review text and like count, as well as duplicate sentences. We remove duplicate sentences from the reviews and exclude problematic reviews with missing fields. After clean-up, the corpus consists of 2259 popular and 2555 non-popular reviews.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about book subjects. It has 4 rows and is filtered where the books is Bestsellers : popular fiction of the 1970s. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.
In 2023, the most popular book genre in France was related to DIY (do-it-yourself), leisure and lifestyle activities. Despite the development of new technologies and social networks over the past decade, reading keeps being one of the favorite activities in France. In the country of famous writers like Victor Hugo or Molière, the majority of the population likes to see itself as more than average reader. A passion for hobbies and personal activities When looking at the statistic it appears that the most popular book genres in France concern leisure activities, hobbies and education. History books were mentioned by ** percent of French surveyed, as well as art and photography books which were preferred by ** percent of interviewees. Fiction, with the exception of comics, does not seem to delight the hearts of French people when it comes to reading. Comic books of all kinds are experiencing an unprecedented popularity boost among French readers. A major market In 2021, almost *** million book copies were sold in France. The French book market is one of the most important in Europe, while the average French reports reading **** to **** books print books per year. Even though e-books are getting popular, especially among younger generations, the French seem to prefer print format. In 2021, ** percent of the population in France read print books, compared to ** percent for digital books.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This version of the dataset is obsolete. It contains duplicate ratings (same user_id,book_id), as reported by Philipp Spachtholz in his illustrious notebook.
The current version has duplicates removed, and more ratings (six million), sorted by time. Book and user IDs are the same.
**It is available at https://github.com/zygmuntz/goodbooks-10k. **
There have been good datasets for movies (Netflix, Movielens) and music (Million Songs) recommendation, but not for books. That is, until now.
This dataset contains ratings for ten thousand popular books. As to the source, let's say that these ratings were found on the internet. Generally, there are 100 reviews for each book, although some have less - fewer - ratings. Ratings go from one to five.
Both book IDs and user IDs are contiguous. For books, they are 1-10000, for users, 1-53424. All users have made at least two ratings. Median number of ratings per user is 8.
There are also books marked to read by the users, book metadata (author, year, etc.) and tags.
ratings.csv contains ratings and looks like that:
book_id,user_id,rating
1,314,5
1,439,3
1,588,5
1,1169,4
1,1185,4
to_read.csv provides IDs of the books marked "to read" by each user, as user_id,book_id pairs.
books.csv has metadata for each book (goodreads IDs, authors, title, average rating, etc.).
The metadata have been extracted from goodreads XML files, available in the third version of this dataset as books_xml.tar.gz. The archive contains 10000 XML files. One of them is available as sample_book.xml. To make the download smaller, these files are absent from the current version. Download version 3 if you want them.
book_tags.csv contains tags/shelves/genres assigned by users to books. Tags in this file are represented by their IDs.
tags.csv translates tag IDs to names.
See the notebook for some basic stats of the dataset.
Each book may have many editions. goodreads_book_id and best_book_id generally point to the most popular edition of a given book, while goodreads work_id refers to the book in the abstract sense.
You can use the goodreads book and work IDs to create URLs as follows:
https://www.goodreads.com/book/show/2767052
https://www.goodreads.com/work/editions/2792775
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about book subjects. It has 5 rows and is filtered where the books is Reading popular romance in early modern England. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.
In 2020, ** percent of book readers in the United States favored a combination of print, e-books, and audiobooks, compared to ** percent of respondents from the United Kingdom and Germany. Print books remained popular with more than ** percent of readers in all three countries preferring to read print only, whereas very few readers listened to or read exclusively audiobooks or e-books.
The statistic displays the results of a survey on the most popular e-book genres in the Netherlands in 2018. Participants who sometimes read e-books were asked from which genres they sometimes read books. The survey outcome indicated that suspense fiction books were most popular. Over 60 percent of respondents occasionally read books from this genre, whereas the number who read young adult books was comparatively low at just ten percent.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F14601120%2Fb31574d613385af49d25c679431b3313%2Fclose-up-opened-book-library.jpg?generation=1687198503124618&alt=media" alt="">
This dataset contains lists of best-selling books and book series in any language. The term "best-selling" refers to the expected number of copies sold for each book, not the number of books printed or currently owned. This list excludes comic books and textbooks. The books are arranged in the order of the greatest sales estimate reported by credible, independent sources.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about book subjects. It has 1 row and is filtered where the books is Popular music in America : the beat goes on. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.
According to a survey conducted in China released in April 2025, dangdang.com and JD.com were the most popular channels for book purchases, confirmed by over 54 percent of respondents. About 22 percent of survey participants said they bought books from physical bookstores.
Top 10 most popular e-books of Hong Kong Public Libraries consulted by the general public (with links for online reading/download)
The graph shows leading book genres in the United States as of July 2015, be reader's genre. During a survey, seven percent of male and 44 percent of female respondents stated they had read a romance in the year leading up to the survey.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about book subjects. It has 1 row and is filtered where the books is Popular magic : cunning folk in English history. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Goodreads Top 100 Classics’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/notkrishna/goodreads-top-100-classical-books-of-all-time on 28 January 2022.
--- Dataset description provided by original source is as follows ---
These are the top 25 of that list as compiled by Entertainment Weekly. All had to have been published in the last 25 years (1983-2008). Add more, vote for your favorites, or tell us where they went wrong!
Being an avid book reader and a member of goodreads.com, I always wanted to combine my love for data and books. So here's goodreads top 100 classics in the last 25 years.
The data was acquired by scrapping goodreads.com. Goodreads is an American social cataloging website and a subsidiary of Amazon that allows individuals to search its database of books, annotations, quotes, and reviews. Users can sign up and register books to generate library catalogs and reading lists. It's one of the biggest community for readers with books ranging from classics to more new releases.
I have been part of this community for a long time and wanted to share this data with you kaggle community to see their work.
Source: Goodreads Top 100 classics
Your data will be in front of the world's largest readers' community. What questions do you want to see answered?
--- Original source retains full ownership of the source dataset ---
The statistic displays the results of a survey regarding book genres read in the Netherlands in 2018. The survey results indicate that of the various genres considered, the most frequently read genre in the Netherlands in 2018 was the thriller fiction genre. Over 35 percent of respondents indicated that they regularly read thrillers, whereas poetry was less popular, with just over five percent of respondents indicated that they read poetry at least once a month.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The dataset has been collected in the frame of the Prac1 of the subject Tipology and Data Life Cycle of the Master's Degree in Data Science of the Universitat Oberta de Catalunya (UOC).
The dataset contains 25 variables and 52478 records corresponding to books on the GoodReads Best Books Ever list (the larges list on the site).
Original code used to retrieve the dataset can be found on github repository: github.com/scostap/goodreads_bbe_dataset
The data was retrieved in two sets, the first 30000 books and then the remainig 22478. Dates were not parsed and reformated on the second chunk so publishDate and firstPublishDate are representet in a mm/dd/yyyy format for the first 30000 records and Month Day Year for the rest.
Book cover images can be optionally downloaded from the url in the 'coverImg' field. Python code for doing so and an example can be found on the github repo.
The 25 fields of the dataset are:
| Attributes | Definition | Completeness |
| ------------- | ------------- | ------------- |
| bookId | Book Identifier as in goodreads.com | 100 |
| title | Book title | 100 |
| series | Series Name | 45 |
| author | Book's Author | 100 |
| rating | Global goodreads rating | 100 |
| description | Book's description | 97 |
| language | Book's language | 93 |
| isbn | Book's ISBN | 92 |
| genres | Book's genres | 91 |
| characters | Main characters | 26 |
| bookFormat | Type of binding | 97 |
| edition | Type of edition (ex. Anniversary Edition) | 9 |
| pages | Number of pages | 96 |
| publisher | Editorial | 93 |
| publishDate | publication date | 98 |
| firstPublishDate | Publication date of first edition | 59 |
| awards | List of awards | 20 |
| numRatings | Number of total ratings | 100 |
| ratingsByStars | Number of ratings by stars | 97 |
| likedPercent | Derived field, percent of ratings over 2 starts (as in GoodReads) | 99 |
| setting | Story setting | 22 |
| coverImg | URL to cover image | 99 |
| bbeScore | Score in Best Books Ever list | 100 |
| bbeVotes | Number of votes in Best Books Ever list | 100 |
| price | Book's price (extracted from Iberlibro) | 73 |