100+ datasets found

Best Books Ever Dataset
zenodo.org
csv
Updated Nov 10, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lorena Casanova Lozano; Sergio Costa Planells; Lorena Casanova Lozano; Sergio Costa Planells (2020). Best Books Ever Dataset [Dataset]. http://doi.org/10.5281/zenodo.4265096
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4265096
Dataset updated
Nov 10, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Lorena Casanova Lozano; Sergio Costa Planells; Lorena Casanova Lozano; Sergio Costa Planells
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
The dataset has been collected in the frame of the Prac1 of the subject Tipology and Data Life Cycle of the Master's Degree in Data Science of the Universitat Oberta de Catalunya (UOC).

The dataset contains 25 variables and 52478 records corresponding to books on the GoodReads Best Books Ever list (the larges list on the site).

Original code used to retrieve the dataset can be found on github repository: github.com/scostap/goodreads_bbe_dataset

The data was retrieved in two sets, the first 30000 books and then the remainig 22478. Dates were not parsed and reformated on the second chunk so publishDate and firstPublishDate are representet in a mm/dd/yyyy format for the first 30000 records and Month Day Year for the rest.

Book cover images can be optionally downloaded from the url in the 'coverImg' field. Python code for doing so and an example can be found on the github repo.

The 25 fields of the dataset are:

| Attributes | Definition | Completeness | | ------------- | ------------- | ------------- | | bookId | Book Identifier as in goodreads.com | 100 | | title | Book title | 100 | | series | Series Name | 45 | | author | Book's Author | 100 | | rating | Global goodreads rating | 100 | | description | Book's description | 97 | | language | Book's language | 93 | | isbn | Book's ISBN | 92 | | genres | Book's genres | 91 | | characters | Main characters | 26 | | bookFormat | Type of binding | 97 | | edition | Type of edition (ex. Anniversary Edition) | 9 | | pages | Number of pages | 96 | | publisher | Editorial | 93 | | publishDate | publication date | 98 | | firstPublishDate | Publication date of first edition | 59 | | awards | List of awards | 20 | | numRatings | Number of total ratings | 100 | | ratingsByStars | Number of ratings by stars | 97 | | likedPercent | Derived field, percent of ratings over 2 starts (as in GoodReads) | 99 | | setting | Story setting | 22 | | coverImg | URL to cover image | 99 | | bbeScore | Score in Best Books Ever list | 100 | | bbeVotes | Number of votes in Best Books Ever list | 100 | | price | Book's price (extracted from Iberlibro) | 73 |
h
institutional-books-1.0
huggingface.co
Updated Jun 11, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Institutional Data Initiative (2025). institutional-books-1.0 [Dataset]. https://huggingface.co/datasets/institutional/institutional-books-1.0
Explore at:
Dataset updated
Jun 11, 2025
Dataset authored and provided by
Institutional Data Initiative
Description
📚 Institutional Books 1.0

Institutional Books is a growing corpus of public domain books. This 1.0 release is comprised of 983,004 public domain books digitized as part of Harvard Library's participation in the Google Books project and refined by the Institutional Data Initiative. Use of this data is governed by the IDI Terms of Use for Early-Access.

983K books, published largely in the 19th and 20th centuries 242B o200k_base tokens 386M pages of text, available in both original… See the full description on the dataset page: https://huggingface.co/datasets/institutional/institutional-books-1.0.
Books Dataset
figshare.com
txt
Updated Jan 19, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Giuseppe Mendola (2016). Books Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.1441255.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.1441255.v1
Dataset updated
Jan 19, 2016
Dataset provided by
Figsharehttp://figshare.com/
Authors
Giuseppe Mendola
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This database contains information about books gathered with help of Google Books API. The database contains 7 different tables where 3 of them are only to relate the other tables together. Tables: Books contains 1062 records. Authors contains 1595 records. Categories 109 records. Metadata 37 records. MD5 (GBooks_2015-06-09.sql) = bfd09094d0e123e668b2e58332b1a98b
Data from: Christian Books Dataset
kaggle.com
Updated Sep 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chibuzor Nwachukwu (2023). Christian Books Dataset [Dataset]. https://www.kaggle.com/datasets/chibuzornwachukwu/christian-books-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 8, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Chibuzor Nwachukwu
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
The dataset contains metadata for over 200,000 Christian books, papers covering a wide range of genres and topics. The data was collected from a variety of sources, including online retailers, libraries, and publishers.

This dataset contains a series of metadata for Christian books, including the following fields:

title: The title of the book.

author_name: The author of the book.

publisher: The publisher of the book.

publish_date: The date the book was published.

publish_place: The place where the books was published.

isbn: The International Standard Book Number (ISBN) of the book.

genre: The genre of the book, such as fiction, non-fiction, or theology.

ia_collection: Internet Archive Collection.

first_sentence

language: Languages with which the books are written

currently_reading_count:

edition_count: Number of recognized publications of books/papers

edition_key:

number_of_pages_median: Median number of pages in books .. . . . .
h
Hindawi-Books-dataset
huggingface.co
Updated Jul 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ali El Filali (2023). Hindawi-Books-dataset [Dataset]. https://huggingface.co/datasets/alielfilali01/Hindawi-Books-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 31, 2023
Authors
Ali El Filali
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Dataset Card for "Hindawi Books Dataset"

Hindawi Books Dataset is a large collection of more than 3000 books written in Modern Standard Arabic.

Dataset Description

Hindawi Books Dataset offers a rich and diverse collection of literary works, covering various topics and genres, all written in Modern Standard Arabic. The dataset includes information about each book, such as the title, author name, book abstract, and a link to access the complete text online. Additionally… See the full description on the dataset page: https://huggingface.co/datasets/alielfilali01/Hindawi-Books-dataset.
h
Data from: arabic-books
huggingface.co
Updated Nov 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohamed Rashad (2024). arabic-books [Dataset]. https://huggingface.co/datasets/MohamedRashad/arabic-books
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 28, 2024
Authors
Mohamed Rashad
License
https://choosealicense.com/licenses/gpl-3.0/https://choosealicense.com/licenses/gpl-3.0/
Description
Arabic Books

Dataset Summary

The arabic-books dataset contains 8,500 rows of text, each representing the full text of a single Arabic book. These texts were extracted using the arabic-large-nougat model, showcasing the model’s capabilities in Arabic OCR and text extraction. The dataset spans a total of 1.1 billion tokens, calculated using the GPT-4 tokenizer. This dataset is a testimony to the quality of the Arabic Nougat models and their effectiveness in extracting… See the full description on the dataset page: https://huggingface.co/datasets/MohamedRashad/arabic-books.
R
Oriented Books Dataset
universe.roboflow.com
zip
Updated Jul 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
koteitan (2024). Oriented Books Dataset [Dataset]. https://universe.roboflow.com/koteitan/oriented-books
Explore at:
zipAvailable download formats
Dataset updated
Jul 4, 2024
Dataset authored and provided by
koteitan
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Variables measured
Book Bounding Boxes
Description
Oriented Books

## Overview Oriented Books is a dataset for object detection tasks - it contains Book annotations for 661 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [MIT license](https://creativecommons.org/licenses/MIT).
Book consumption in the U.S. 2011-2021, by format
statista.com
Updated Jun 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Book consumption in the U.S. 2011-2021, by format [Dataset]. https://www.statista.com/statistics/222754/book-format-used-by-readers-in-the-us/
Explore at:
Dataset updated
Jun 24, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
United States
Description
Reading books remains a popular pastime for U.S. adults, with ** percent of respondents to a 2021 survey saying that they had read a book in any format within the last year. Despite online media formats now being the preferred option for many consumers when it comes to television, music, and gaming, print books are by far the most popular format among readers in the United States. Whilst almost double the share of adults now read audiobooks compared to 2011, only ** percent claimed to have read an audiobook in the last year compared to ** percent who said that they had read a print book. Book sales in the United States In 2020, bookstore sales in the United States amounted to **** billion U.S. dollars. Sales in 2019 and 2020 were the lowest recorded since the early *****, and the combined effect of the coronavirus outbreak, along with the growing appeal of online purchasing, will likely mean that bookstore sales will continue to drop. Bookstores tend to see most success in August, December, and January, and sales revenue often surpasses *********** U.S. dollars in those months each year. That said, monthly retail sales of bookstores in the U.S. are notably lower overall than in previous years and were particularly poor in spring 2020 as a result of national shutdowns to stem the spread of COVID-19. Influence of COVID-19 on reading habits The coronavirus pandemic led to increased media consumption in general, but not only among avid video and music streaming fans. Data from a survey in March 2020 revealed that ** percent of Millennials read more books due to the COVID-19 outbreak, making consumers in this group the most likely to have done so compared to ** percent of the total survey sample. Meanwhile, ** percent of Boomers said that their reading habits had not changed.
Google Books Dataset
kaggle.com
Updated Nov 25, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bilal Yussef (2019). Google Books Dataset [Dataset]. https://www.kaggle.com/datasets/bilalyussef/google-books-dataset/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 25, 2019
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Bilal Yussef
Description
Context

This data was gathered as part of the data mining project for General Assembly Data Science Immersive course.

Content

This data was acquired from Google Books store. Google API was used to acquire the data. Nine features were gathered for each book in the data set. the column names mostly are self explanatory nevertheless, it will be explained below.

title : the title of the book.

authors : name of the authors of the books (might include more than one author.

language : the language of the book

generes\categories : the categories associated with the book (by Google store)

rating\averageRating : the average rating of each book out of 5.

maturityRating : wheather the content of the book is for mature or NOT MATURE audience.

publisher : the name of the publisher.

publishedDate : when the book was published.

pageCount : number of pages of the books.

voters : the number of voters to the book.

ISBN : the unique identifier for each book.

description : brief introductory description of the book.

price : price of the book on the google books store

currency : the currency of the price in the google books store.

Acknowledgements

I like to thank google for making a free available API for their services and websites. I also would love to acknowledge the effort of the web scraper extension developer, it is really nice and powerful tool for web scraping.

Licenses

©2019 Google

Inspiration

Here is a story. you love reading books, and recently, you bought a book that you thought you liked. However, after reading half the book you still don't feel the enthusiasm and joy you expected. I think that machine learning algorithms might help solve such a problems.
w
Dataset of author, book publisher and ISBN of books
workwithdata.com
Updated Apr 17, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2025). Dataset of author, book publisher and ISBN of books [Dataset]. https://www.workwithdata.com/datasets/books?col=author%2Cbook%2Cbook_publisher%2Cisbn
Explore at:
Dataset updated
Apr 17, 2025
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about books. It has 2,617,384 rows. It features 4 columns: author, book publisher, and ISBN. It is 97% filled with non-null values.
w
Dataset of books called Genre
workwithdata.com
Updated Apr 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2025). Dataset of books called Genre [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=Genre
Explore at:
Dataset updated
Apr 17, 2025
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about books. It has 6 rows and is filtered where the book is Genre. It features 7 columns including author, publication date, language, and book publisher.
o
Google Books Ngrams
registry.opendata.aws
Updated Apr 20, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Not managed (2018). Google Books Ngrams [Dataset]. https://registry.opendata.aws/google-ngrams/
Explore at:
Dataset updated
Apr 20, 2018
Dataset provided by
Not managed
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Description
N-grams are fixed size tuples of items. In this case the items are words extracted from the Google Books corpus. The n specifies the number of elements in the tuple, so a 5-gram contains five words or characters. The n-grams in this dataset were produced by passing a sliding window of the text of books and outputting a record for each new token.
Preferred book formats in the U.S. 2020
statista.com
Updated Mar 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Preferred book formats in the U.S. 2020 [Dataset]. https://www.statista.com/statistics/299074/book-consumption-per-capita-print-ebook-usa/
Explore at:
Dataset updated
Mar 10, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Mar 28, 2020 - Apr 27, 2020
Area covered
United States
Description
According to a survey held in the United States between March and April 2020, 70 percent of respondents said that they read print books the most, with 39 percent of those consumers preferring their books to be new.

The study was conducted as the U.S. went into lockdown to prevent the spread of the coronavirus, however although the virus certainly affected media consumption in the United States, what did not change was consumers' book preferences. Print has always been the most popular book format in the U.S. and figures on increased media consumption during the pandemic showed that even Gen Z, a generation famed for loving digital, were the most likely to be reading books more than usual during the outbreak.

Book consumption in the U.S.

Whilst printed newspapers and magazines have struggled to survive as digital formats grow ever more prevalent and appealing, when it comes to books U.S. consumers still have a clear preference for print. Annual survey data consistently shows that U.S. adults are far more likely to have read a print book in the last year than a digital version thereof, and whilst the popularity of digital books has increased, print remains the favorite.

As far as book buying goes, whilst the number of print books sold in the U.S. fluctuates each year, the figures remain relatively stable. Although unit sales have not surpassed 700 million since 2010, the number came close in 2018 and yearly sales from 2015 to 2019 were higher than the amount recorded in 2004.
o
Books, Minds, and Bodies dataset
ora.ox.ac.uk
Updated Jan 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Troscianko, E; Carney, J; Holman, E (2022). Books, Minds, and Bodies dataset [Dataset]. http://doi.org/10.5287/bodleian:gJZz9KDE0
Explore at:
(10133), (124412), (10276), (41302)Available download formats
Unique identifier
https://doi.org/10.5287/bodleian:gJZz9KDE0
Dataset updated
Jan 1, 2022
Dataset provided by
University of Oxford
Authors
Troscianko, E; Carney, J; Holman, E
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
These data were gathered during the Books, Minds, and Bodies research project in 2015-16. The project was designed to investigate the therapeutic potential of shared reading, and involved running 2 reading groups over 2 consecutive terms and recording participants' discussions of the texts being read aloud together. These recordings were subsequently transcribed and used for analysis of emotional variance and linguistic similarity.

Consistent with the ethical approval granted for the study, word order in the transcripts has been randomized so as to preclude any personal data being disclosed. This was done by tokenizing the text of each transcript into grammatical and lexical units (i.e. punctuation signs and words). These were shuffled using the "Random" module in the Python programming language, which provides a range of mathematical operations for collections of discrete objects. Nevertheless, grouping variables were preserved at the level of group (MT and HT terms) and session ID. As the calculation of values for emotional variance (on the dimensions of valence, arousal, and dominance) does not require syntax to be preserved, randomizing the data in this way should not affect the future calculation of word norm values.

The dataset also includes text/discussion similarity calculations, qualitative coding results, and participants' post-participation feedback data.

NB: this dataset replaces 'Books, Minds, and Bodies: raw transcript text plus VAD values' at https://ora.ox.ac.uk/objects/uuid:c370b75b-d37e-41be-89bb-cbb67a0c8614
R
Data from: Arabic Books Dataset
universe.roboflow.com
zip
Updated Aug 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Moayad alghamdi (2024). Arabic Books Dataset [Dataset]. https://universe.roboflow.com/moayad-alghamdi/arabic-books
Explore at:
zipAvailable download formats
Dataset updated
Aug 12, 2024
Dataset authored and provided by
Moayad alghamdi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Paragraphs Bounding Boxes
Description
Don't worry about it, This is professionals work right here!
h
goodreads-book-descriptions
huggingface.co
Updated Jun 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Book Souls (2024). goodreads-book-descriptions [Dataset]. https://huggingface.co/datasets/booksouls/goodreads-book-descriptions
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 23, 2024
Authors
Book Souls
Description
Goodreads Book Descriptions

A dataset of English book titles and descriptions from Goodreads. The original dataset has 2.3 million books total with many more fields. There may exist a small number of non-English books in this dataset.

Citations

Mengting Wan, Julian McAuley, "Item Recommendation on Monotonic Behavior Chains", in RecSys'18. Mengting Wan, Rishabh Misra, Ndapa Nakashole, Julian McAuley, "Fine-Grained Spoiler Detection from Large-Scale Review Corpora", in… See the full description on the dataset page: https://huggingface.co/datasets/booksouls/goodreads-book-descriptions.
w
Dataset of books published by Houghton Mifflin
workwithdata.com
Updated Apr 17, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2025). Dataset of books published by Houghton Mifflin [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book_publisher&fop0=%3D&fval0=Houghton+Mifflin
Explore at:
Dataset updated
Apr 17, 2025
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about books. It has 534 rows and is filtered where the book publisher is Houghton Mifflin. It features 7 columns including author, publication date, language, and book publisher.
Number of books read yearly by U.S. consumers 2021, by gender
statista.com
Updated Jul 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Number of books read yearly by U.S. consumers 2021, by gender [Dataset]. https://www.statista.com/statistics/896508/number-of-books-consumers-read-per-year-by-gender/
Explore at:
Dataset updated
Jul 11, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Dec 1, 2021 - Dec 16, 2021
Area covered
United States
Description
As of December 2021, just ** percent of surveyed women in the United States said that they had not read any books in the last year, ten percent less than the share of men who said the same. Both male and female respondents were most likely to have read *** to **** books in the year leading to the survey, though **** percent of women reported having read more than ** books in that time.
o
Primary Schools Text Books - Dataset - openAFRICA
open.africa
Updated Nov 10, 2015
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2015). Primary Schools Text Books - Dataset - openAFRICA [Dataset]. https://open.africa/dataset/table-51-primary-schools-text-books
Explore at:
Dataset updated
Nov 10, 2015
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The Ministry of Educations' - Basic Education Statistical Booklet captures national statistics for the Education Sector in totality. This dataset highlights the number of primary school textbooks per subject in each and every county Source: Table 51- Primary Schools Text Books
w
Dataset of books published by Bantam Books
workwithdata.com
Updated Apr 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2025). Dataset of books published by Bantam Books [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book_publisher&fop0=%3D&fval0=Bantam+Books
Explore at:
Dataset updated
Apr 17, 2025
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about books. It has 377 rows and is filtered where the book publisher is Bantam Books. It features 7 columns including author, publication date, language, and book publisher.

Facebook

Twitter

Click to copy link

Link copied

Cite

Lorena Casanova Lozano; Sergio Costa Planells; Lorena Casanova Lozano; Sergio Costa Planells (2020). Best Books Ever Dataset [Dataset]. http://doi.org/10.5281/zenodo.4265096

Best Books Ever Dataset

Explore at:

3 scholarly articles cite this dataset (View in Google Scholar)

csvAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.4265096

Dataset updated

Nov 10, 2020

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Lorena Casanova Lozano; Sergio Costa Planells; Lorena Casanova Lozano; Sergio Costa Planells

License

Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically

Description

The dataset has been collected in the frame of the Prac1 of the subject Tipology and Data Life Cycle of the Master's Degree in Data Science of the Universitat Oberta de Catalunya (UOC).

The dataset contains 25 variables and 52478 records corresponding to books on the GoodReads Best Books Ever list (the larges list on the site).

Original code used to retrieve the dataset can be found on github repository: github.com/scostap/goodreads_bbe_dataset

The data was retrieved in two sets, the first 30000 books and then the remainig 22478. Dates were not parsed and reformated on the second chunk so publishDate and firstPublishDate are representet in a mm/dd/yyyy format for the first 30000 records and Month Day Year for the rest.

Book cover images can be optionally downloaded from the url in the 'coverImg' field. Python code for doing so and an example can be found on the github repo.

The 25 fields of the dataset are:

| Attributes | Definition | Completeness |
| ------------- | ------------- | ------------- | 
| bookId | Book Identifier as in goodreads.com | 100 |
| title | Book title | 100 |
| series | Series Name | 45 |
| author | Book's Author | 100 |
| rating | Global goodreads rating | 100 |
| description | Book's description | 97 |
| language | Book's language | 93 |
| isbn | Book's ISBN | 92 |
| genres | Book's genres | 91 |
| characters | Main characters | 26 |
| bookFormat | Type of binding | 97 |
| edition | Type of edition (ex. Anniversary Edition) | 9 |
| pages | Number of pages | 96 |
| publisher | Editorial | 93 |
| publishDate | publication date | 98 |
| firstPublishDate | Publication date of first edition | 59 |
| awards | List of awards | 20 |
| numRatings | Number of total ratings | 100 |
| ratingsByStars | Number of ratings by stars | 97 |
| likedPercent | Derived field, percent of ratings over 2 starts (as in GoodReads) | 99 |
| setting | Story setting | 22 |
| coverImg | URL to cover image | 99 |
| bbeScore | Score in Best Books Ever list | 100 |
| bbeVotes | Number of votes in Best Books Ever list | 100 |
| price | Book's price (extracted from Iberlibro) | 73 |

Clear search

Close search

Google apps

Main menu

Best Books Ever Dataset

institutional-books-1.0

Books Dataset

Data from: Christian Books Dataset

Hindawi-Books-dataset

Data from: arabic-books

Oriented Books Dataset

Oriented Books

Book consumption in the U.S. 2011-2021, by format

Google Books Dataset

Context

Content

Acknowledgements

Licenses

Inspiration

Dataset of author, book publisher and ISBN of books

Dataset of books called Genre

Google Books Ngrams

Preferred book formats in the U.S. 2020

Books, Minds, and Bodies dataset

Data from: Arabic Books Dataset

goodreads-book-descriptions

Dataset of books published by Houghton Mifflin

Number of books read yearly by U.S. consumers 2021, by gender

Primary Schools Text Books - Dataset - openAFRICA

Dataset of books published by Bantam Books

Best Books Ever Dataset