According to the results of a survey held in the United States, the share of Americans who had read more than ** books in the last three months stood at **** percent in February 2024. However, **** percent had not any read any books in the three months running up to the survey.
In 2022 there were more than 5.4 million book readers in Italy between the ages of six and 24 years who read at least one book in the last 12 months. By comparison, the corresponding figure for those aged 45 to 64 years stood at more than 6.7 million, with this age group also being the most likely to read several books per year.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about books. It has 89 rows and is filtered where the book subjects is Statistics-Problems, exercises, etc. It features 9 columns including author, publication date, language, and book publisher.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The dataset has been collected in the frame of the Prac1 of the subject Tipology and Data Life Cycle of the Master's Degree in Data Science of the Universitat Oberta de Catalunya (UOC).
The dataset contains 25 variables and 52478 records corresponding to books on the GoodReads Best Books Ever list (the larges list on the site).
Original code used to retrieve the dataset can be found on github repository: github.com/scostap/goodreads_bbe_dataset
The data was retrieved in two sets, the first 30000 books and then the remainig 22478. Dates were not parsed and reformated on the second chunk so publishDate and firstPublishDate are representet in a mm/dd/yyyy format for the first 30000 records and Month Day Year for the rest.
Book cover images can be optionally downloaded from the url in the 'coverImg' field. Python code for doing so and an example can be found on the github repo.
The 25 fields of the dataset are:
| Attributes | Definition | Completeness |
| ------------- | ------------- | ------------- |
| bookId | Book Identifier as in goodreads.com | 100 |
| title | Book title | 100 |
| series | Series Name | 45 |
| author | Book's Author | 100 |
| rating | Global goodreads rating | 100 |
| description | Book's description | 97 |
| language | Book's language | 93 |
| isbn | Book's ISBN | 92 |
| genres | Book's genres | 91 |
| characters | Main characters | 26 |
| bookFormat | Type of binding | 97 |
| edition | Type of edition (ex. Anniversary Edition) | 9 |
| pages | Number of pages | 96 |
| publisher | Editorial | 93 |
| publishDate | publication date | 98 |
| firstPublishDate | Publication date of first edition | 59 |
| awards | List of awards | 20 |
| numRatings | Number of total ratings | 100 |
| ratingsByStars | Number of ratings by stars | 97 |
| likedPercent | Derived field, percent of ratings over 2 starts (as in GoodReads) | 99 |
| setting | Story setting | 22 |
| coverImg | URL to cover image | 99 |
| bbeScore | Score in Best Books Ever list | 100 |
| bbeVotes | Number of votes in Best Books Ever list | 100 |
| price | Book's price (extracted from Iberlibro) | 73 |
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about book series. It has 1 row and is filtered where the books is Statistics : a guide for therapists. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about books. It has 11 rows and is filtered where the book is Statistics for management. It features 7 columns including author, publication date, language, and book publisher.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The Goodreads Book Reviews dataset encapsulates a wealth of reviews and various attributes concerning the books listed on the Goodreads platform. A distinguishing feature of this dataset is its capture of multiple tiers of user interaction, ranging from adding a book to a "shelf", to rating and reading it. This dataset is a treasure trove for those interested in understanding user behavior, book recommendations, sentiment analysis, and the interplay between various attributes of books and user interactions.
Basic Statistics: - Items: 1,561,465 - Users: 808,749 - Interactions: 225,394,930
Metadata: - Reviews: The text of the reviews provided by users. - Add-to-shelf, Read, Review Actions: Various interactions users have with the books. - Book Attributes: Attributes describing the books including title, and ISBN. - Graph of Similar Books: A graph depicting similarity relations between books.
Example (interaction data):
json
{
"user_id": "8842281e1d1347389f2ab93d60773d4d",
"book_id": "130580",
"review_id": "330f9c153c8d3347eb914c06b89c94da",
"isRead": true,
"rating": 4,
"date_added": "Mon Aug 01 13:41:57 -0700 2011",
"date_updated": "Mon Aug 01 13:42:41 -0700 2011",
"read_at": "Fri Jan 01 00:00:00 -0800 1988",
"started_at": ""
}
Use Cases: - Book Recommendations: Creating personalized book recommendations based on user interactions and preferences. - Sentiment Analysis: Analyzing sentiment in reviews and understanding how different book attributes influence sentiment. - User Behavior Analysis: Understanding user interaction patterns with books and deriving insights to enhance user engagement. - Natural Language Processing: Training models to process and analyze user-generated text in reviews. - Similarity Analysis: Analyzing the graph of similar books to understand book similarities and clustering.
Citation:
Please cite the following if you use the data:
Item recommendation on monotonic behavior chains
Mengting Wan, Julian McAuley
RecSys, 2018
[PDF](https://cseweb.ucsd.edu/~jmcauley/pdfs/recsys18e.pdf)
Code Samples: A curated set of code samples is provided in the dataset's Github repository, aiding in seamless interaction with the datasets. These include: - Downloading datasets without GUI: Facilitating dataset download in a non-GUI environment. - Displaying Sample Records: Showcasing sample records to get a glimpse of the dataset structure. - Calculating Basic Statistics: Computing basic statistics to understand the dataset's distribution and characteristics. - Exploring the Interaction Data: Delving into interaction data to grasp user-book interaction patterns. - Exploring the Review Data: Analyzing review data to extract valuable insights from user reviews.
Additional Dataset: - Complete book reviews (~15m multilingual reviews about ~2m books and 465k users): This dataset comprises a comprehensive collection of reviews, showcasing a multilingual facet with reviews about around 2 million books from 465,000 users.
Datasets:
This dataset was created by Carlos Heryhelder
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This database contains information about books gathered with help of Google Books API. The database contains 7 different tables where 3 of them are only to relate the other tables together. Tables: Books contains 1062 records. Authors contains 1595 records. Categories 109 records. Metadata 37 records. MD5 (GBooks_2015-06-09.sql) = bfd09094d0e123e668b2e58332b1a98b
The number of digital books borrowed from libraries and schools hit *** million in 2023. E-books have historically been more popular among digital book borrowers, with *** million borrowed from the ** thousand libraries and schools included in the study in that year.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about books. It has 226 rows and is filtered where the book subjects is Commercial statistics. It features 9 columns including author, publication date, language, and book publisher.
The country reported to have read books most regularly in 2017 was China, where a survey among internet users across ** countries revealed that ** percent of respondents read a book every day or most days, and ** percent read at least once a week. Conversely, just ** percent of South Korean respondents were reading books on a daily basis. Other countries with a low share of those aged 15 years or above reading daily included Belgium, Japan, the Netherlands and Mexico.
Age and reading habits
It is surprising how much age can affect reading habits, even on a global level. In Germany, more 12 to 13-year-olds read daily or several times per week than their slightly older peers. Meanwhile, in the United Kingdom, a survey showed that more teenagers and Millennials said that they would be happy without books than adults aged 34 or older. More than double the percentage of adults in Colombia aged 65 or above read a book every day than those aged between 12 and 25 years.
The number of books read over the past year in the United States was overall higher among adults aged 18 to 34 than older generations, and in Canada the share of children reading books for fun halved with the approach of teenage years. Whilst ** percent of children aged between six and eight years old were reading for pleasure multiple times per week, among ** to 17-year-olds this figure amounted to just ** percent. Meanwhile, the opposite was true of going online for fun, which increased sharply with age and replaced the activity of reading.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Explore the statistics for Books & Literature eCommerce in 2025, including store count by region and platform, estimated sales amount by platform and region, products sold by platform and region, and total app spend by platform and region. Gain insights into regional preferences, market penetration, consumer trends, and technological investments within the Books & Literature sector. Discover the leading regions and platforms, as well as the dynamics of sales and product volumes. Stay informed about the evolving landscape of Books & Literature online stores for a comprehensive understanding of the market.
I wanted to find good data about representation and diversity in literature, which brought me to the following page of the Cooperative Children's Book Center (CCBC): https://ccbc.education.wisc.edu/literature-resources/ccbc-diversity-statistics/. The following is data on books by and about Black, Indigenous and People of Color published for children and teens compiled by the Cooperative Children’s Book Center, School of Education, University of Wisconsin-Madison.
There are two .csv files in the data set. One shows books received by the CCBC from US publishers per year that are authored and/or illustrated by a Black/African/Indigenous/Asian/Pacific Islander/Latinx person, and the other shows books received by the CCBC from US publishers per year that feature a BIPOC character. Further explanation can be found at the CCBC FAQ page.
Please note that for 2018 and 2019, the below .csv represent Asian/Pacific Islander people as one column, which is how the CCBC published the data between 2002-2017. Also note that the attached data are not the entire data collected by the CCBC. The CCBC also collects books from international publishers, and since 2018, the CCBC has been publishing data about books by/about Arabs.
All data was collected by the CCBC. Please see the following page (with the complete data) about how to cite the data in your publications/blogs/notebooks: https://ccbc.education.wisc.edu/literature-resources/ccbc-diversity-statistics/books-by-about-poc-fnn/.
I am curious to see what sorts of visualizations people can make in exploratory analysis of this data! Also, can you predict how many BIPOC books the CCBC will receive in 2020? What happens when you study against US population data?
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Our data sheds light on the distribution of Books & Literature stores across different online platforms. WooCommerce leads with a substantial number of stores, holding 34.64K stores, which accounts for 38.12% of the total in this category. Shopify follows with 14.34K stores, making up 15.78% of the Books & Literature market. Meanwhile, Custom Cart offers a significant presence as well, with 12.51K stores, or 13.76% of the total. This chart gives a clear picture of how stores within the Books & Literature sector are spread across these key platforms.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about book series. It has 1 row and is filtered where the books is Measurement, statistics and computation. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.
The detailed financial statistics by North American Industry Classification System (NAICS) 511130 book publishers which include all members under detailed financial statistics and by country of control, (dollars X 1,000,000), every 2 years, for five years of data.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Delving into the Books & Literature sector, our data presents a revealing look at store distribution by region, highlighting regional preferences and market penetration in this niche. United States leads with 30.11K stores, which is 47.11% of the total. United Kingdom follows, contributing 8.09K stores, which is 12.67% of the total. Australia comes third, with 3.34K stores, making up 5.23% of the total.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The dataset contains metadata for over 200,000 Christian books, papers covering a wide range of genres and topics. The data was collected from a variety of sources, including online retailers, libraries, and publishers.
This dataset contains a series of metadata for Christian books, including the following fields:
Net value of book sales by customer category, includes all members under Net value of book sales by customer category, for Book publishers, for Canada and regions, for one year of data.
According to the results of a survey held in the United States, the share of Americans who had read more than ** books in the last three months stood at **** percent in February 2024. However, **** percent had not any read any books in the three months running up to the survey.