100+ datasets found
  1. Best Books Ever Dataset

    • zenodo.org
    csv
    Updated Nov 10, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lorena Casanova Lozano; Sergio Costa Planells; Lorena Casanova Lozano; Sergio Costa Planells (2020). Best Books Ever Dataset [Dataset]. http://doi.org/10.5281/zenodo.4265096
    Explore at:
    csvAvailable download formats
    Dataset updated
    Nov 10, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Lorena Casanova Lozano; Sergio Costa Planells; Lorena Casanova Lozano; Sergio Costa Planells
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    The dataset has been collected in the frame of the Prac1 of the subject Tipology and Data Life Cycle of the Master's Degree in Data Science of the Universitat Oberta de Catalunya (UOC).

    The dataset contains 25 variables and 52478 records corresponding to books on the GoodReads Best Books Ever list (the larges list on the site).

    Original code used to retrieve the dataset can be found on github repository: github.com/scostap/goodreads_bbe_dataset

    The data was retrieved in two sets, the first 30000 books and then the remainig 22478. Dates were not parsed and reformated on the second chunk so publishDate and firstPublishDate are representet in a mm/dd/yyyy format for the first 30000 records and Month Day Year for the rest.

    Book cover images can be optionally downloaded from the url in the 'coverImg' field. Python code for doing so and an example can be found on the github repo.

    The 25 fields of the dataset are:

    | Attributes | Definition | Completeness |
    | ------------- | ------------- | ------------- | 
    | bookId | Book Identifier as in goodreads.com | 100 |
    | title | Book title | 100 |
    | series | Series Name | 45 |
    | author | Book's Author | 100 |
    | rating | Global goodreads rating | 100 |
    | description | Book's description | 97 |
    | language | Book's language | 93 |
    | isbn | Book's ISBN | 92 |
    | genres | Book's genres | 91 |
    | characters | Main characters | 26 |
    | bookFormat | Type of binding | 97 |
    | edition | Type of edition (ex. Anniversary Edition) | 9 |
    | pages | Number of pages | 96 |
    | publisher | Editorial | 93 |
    | publishDate | publication date | 98 |
    | firstPublishDate | Publication date of first edition | 59 |
    | awards | List of awards | 20 |
    | numRatings | Number of total ratings | 100 |
    | ratingsByStars | Number of ratings by stars | 97 |
    | likedPercent | Derived field, percent of ratings over 2 starts (as in GoodReads) | 99 |
    | setting | Story setting | 22 |
    | coverImg | URL to cover image | 99 |
    | bbeScore | Score in Best Books Ever list | 100 |
    | bbeVotes | Number of votes in Best Books Ever list | 100 |
    | price | Book's price (extracted from Iberlibro) | 73 |

  2. h

    institutional-books-1.0

    • huggingface.co
    Updated Jun 11, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Institutional Data Initiative (2025). institutional-books-1.0 [Dataset]. https://huggingface.co/datasets/institutional/institutional-books-1.0
    Explore at:
    Dataset updated
    Jun 11, 2025
    Dataset authored and provided by
    Institutional Data Initiative
    Description

    📚 Institutional Books 1.0

    Institutional Books is a growing corpus of public domain books. This 1.0 release is comprised of 983,004 public domain books digitized as part of Harvard Library's participation in the Google Books project and refined by the Institutional Data Initiative. Use of this data is governed by the IDI Terms of Use for Early-Access.

    983K books, published largely in the 19th and 20th centuries 242B o200k_base tokens 386M pages of text, available in both original… See the full description on the dataset page: https://huggingface.co/datasets/institutional/institutional-books-1.0.

  3. Books Dataset

    • figshare.com
    txt
    Updated Jan 19, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Giuseppe Mendola (2016). Books Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.1441255.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jan 19, 2016
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Giuseppe Mendola
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This database contains information about books gathered with help of Google Books API. The database contains 7 different tables where 3 of them are only to relate the other tables together. Tables: Books contains 1062 records. Authors contains 1595 records. Categories 109 records. Metadata 37 records. MD5 (GBooks_2015-06-09.sql) = bfd09094d0e123e668b2e58332b1a98b

  4. Data from: Christian Books Dataset

    • kaggle.com
    Updated Sep 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chibuzor Nwachukwu (2023). Christian Books Dataset [Dataset]. https://www.kaggle.com/datasets/chibuzornwachukwu/christian-books-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 8, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Chibuzor Nwachukwu
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    The dataset contains metadata for over 200,000 Christian books, papers covering a wide range of genres and topics. The data was collected from a variety of sources, including online retailers, libraries, and publishers.

    This dataset contains a series of metadata for Christian books, including the following fields:

    • title: The title of the book.
    • author_name: The author of the book.
    • publisher: The publisher of the book.
    • publish_date: The date the book was published.
    • publish_place: The place where the books was published.
    • isbn: The International Standard Book Number (ISBN) of the book.
    • genre: The genre of the book, such as fiction, non-fiction, or theology.
    • ia_collection: Internet Archive Collection.
    • first_sentence
    • language: Languages with which the books are written
    • currently_reading_count:
    • edition_count: Number of recognized publications of books/papers
    • edition_key:
    • number_of_pages_median: Median number of pages in books .. . . . .
  5. h

    Hindawi-Books-dataset

    • huggingface.co
    Updated Jul 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ali El Filali (2023). Hindawi-Books-dataset [Dataset]. https://huggingface.co/datasets/alielfilali01/Hindawi-Books-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 31, 2023
    Authors
    Ali El Filali
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Dataset Card for "Hindawi Books Dataset"

    Hindawi Books Dataset is a large collection of more than 3000 books written in Modern Standard Arabic.

      Dataset Description
    

    Hindawi Books Dataset offers a rich and diverse collection of literary works, covering various topics and genres, all written in Modern Standard Arabic. The dataset includes information about each book, such as the title, author name, book abstract, and a link to access the complete text online. Additionally… See the full description on the dataset page: https://huggingface.co/datasets/alielfilali01/Hindawi-Books-dataset.

  6. h

    Data from: arabic-books

    • huggingface.co
    Updated Nov 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohamed Rashad (2024). arabic-books [Dataset]. https://huggingface.co/datasets/MohamedRashad/arabic-books
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 28, 2024
    Authors
    Mohamed Rashad
    License

    https://choosealicense.com/licenses/gpl-3.0/https://choosealicense.com/licenses/gpl-3.0/

    Description

    Arabic Books

      Dataset Summary
    

    The arabic-books dataset contains 8,500 rows of text, each representing the full text of a single Arabic book. These texts were extracted using the arabic-large-nougat model, showcasing the model’s capabilities in Arabic OCR and text extraction. The dataset spans a total of 1.1 billion tokens, calculated using the GPT-4 tokenizer. This dataset is a testimony to the quality of the Arabic Nougat models and their effectiveness in extracting… See the full description on the dataset page: https://huggingface.co/datasets/MohamedRashad/arabic-books.

  7. R

    Oriented Books Dataset

    • universe.roboflow.com
    zip
    Updated Jul 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    koteitan (2024). Oriented Books Dataset [Dataset]. https://universe.roboflow.com/koteitan/oriented-books
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 4, 2024
    Dataset authored and provided by
    koteitan
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Variables measured
    Book Bounding Boxes
    Description

    Oriented Books

    ## Overview
    
    Oriented Books is a dataset for object detection tasks - it contains Book annotations for 661 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [MIT license](https://creativecommons.org/licenses/MIT).
    
  8. Book consumption in the U.S. 2011-2021, by format

    • statista.com
    Updated Jun 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Book consumption in the U.S. 2011-2021, by format [Dataset]. https://www.statista.com/statistics/222754/book-format-used-by-readers-in-the-us/
    Explore at:
    Dataset updated
    Jun 24, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    United States
    Description

    Reading books remains a popular pastime for U.S. adults, with ** percent of respondents to a 2021 survey saying that they had read a book in any format within the last year. Despite online media formats now being the preferred option for many consumers when it comes to television, music, and gaming, print books are by far the most popular format among readers in the United States. Whilst almost double the share of adults now read audiobooks compared to 2011, only ** percent claimed to have read an audiobook in the last year compared to ** percent who said that they had read a print book. Book sales in the United States In 2020, bookstore sales in the United States amounted to **** billion U.S. dollars. Sales in 2019 and 2020 were the lowest recorded since the early *****, and the combined effect of the coronavirus outbreak, along with the growing appeal of online purchasing, will likely mean that bookstore sales will continue to drop. Bookstores tend to see most success in August, December, and January, and sales revenue often surpasses *********** U.S. dollars in those months each year. That said, monthly retail sales of bookstores in the U.S. are notably lower overall than in previous years and were particularly poor in spring 2020 as a result of national shutdowns to stem the spread of COVID-19. Influence of COVID-19 on reading habits The coronavirus pandemic led to increased media consumption in general, but not only among avid video and music streaming fans. Data from a survey in March 2020 revealed that ** percent of Millennials read more books due to the COVID-19 outbreak, making consumers in this group the most likely to have done so compared to ** percent of the total survey sample. Meanwhile, ** percent of Boomers said that their reading habits had not changed.

  9. Google Books Dataset

    • kaggle.com
    Updated Nov 25, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bilal Yussef (2019). Google Books Dataset [Dataset]. https://www.kaggle.com/datasets/bilalyussef/google-books-dataset/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 25, 2019
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Bilal Yussef
    Description

    Context

    This data was gathered as part of the data mining project for General Assembly Data Science Immersive course.

    Content

    This data was acquired from Google Books store. Google API was used to acquire the data. Nine features were gathered for each book in the data set. the column names mostly are self explanatory nevertheless, it will be explained below.

    1. title : the title of the book.
    2. authors : name of the authors of the books (might include more than one author.
    3. language : the language of the book
    4. generes\categories : the categories associated with the book (by Google store)
    5. rating\averageRating : the average rating of each book out of 5.
    6. maturityRating : wheather the content of the book is for mature or NOT MATURE audience.
    7. publisher : the name of the publisher.
    8. publishedDate : when the book was published.
    9. pageCount : number of pages of the books.
    10. voters : the number of voters to the book.
    11. ISBN : the unique identifier for each book.
    12. description : brief introductory description of the book.
    13. price : price of the book on the google books store
    14. currency : the currency of the price in the google books store.

    Acknowledgements

    I like to thank google for making a free available API for their services and websites. I also would love to acknowledge the effort of the web scraper extension developer, it is really nice and powerful tool for web scraping.

    Licenses

    ©2019 Google

    Inspiration

    Here is a story. you love reading books, and recently, you bought a book that you thought you liked. However, after reading half the book you still don't feel the enthusiasm and joy you expected. I think that machine learning algorithms might help solve such a problems.

  10. w

    Dataset of author, book publisher and ISBN of books

    • workwithdata.com
    Updated Apr 17, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2025). Dataset of author, book publisher and ISBN of books [Dataset]. https://www.workwithdata.com/datasets/books?col=author%2Cbook%2Cbook_publisher%2Cisbn
    Explore at:
    Dataset updated
    Apr 17, 2025
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about books. It has 2,617,384 rows. It features 4 columns: author, book publisher, and ISBN. It is 97% filled with non-null values.

  11. w

    Dataset of books called Genre

    • workwithdata.com
    Updated Apr 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2025). Dataset of books called Genre [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=Genre
    Explore at:
    Dataset updated
    Apr 17, 2025
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about books. It has 6 rows and is filtered where the book is Genre. It features 7 columns including author, publication date, language, and book publisher.

  12. o

    Google Books Ngrams

    • registry.opendata.aws
    Updated Apr 20, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Not managed (2018). Google Books Ngrams [Dataset]. https://registry.opendata.aws/google-ngrams/
    Explore at:
    Dataset updated
    Apr 20, 2018
    Dataset provided by
    Not managed
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Description

    N-grams are fixed size tuples of items. In this case the items are words extracted from the Google Books corpus. The n specifies the number of elements in the tuple, so a 5-gram contains five words or characters. The n-grams in this dataset were produced by passing a sliding window of the text of books and outputting a record for each new token.

  13. Preferred book formats in the U.S. 2020

    • statista.com
    Updated Mar 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Preferred book formats in the U.S. 2020 [Dataset]. https://www.statista.com/statistics/299074/book-consumption-per-capita-print-ebook-usa/
    Explore at:
    Dataset updated
    Mar 10, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Mar 28, 2020 - Apr 27, 2020
    Area covered
    United States
    Description

    According to a survey held in the United States between March and April 2020, 70 percent of respondents said that they read print books the most, with 39 percent of those consumers preferring their books to be new.

    The study was conducted as the U.S. went into lockdown to prevent the spread of the coronavirus, however although the virus certainly affected media consumption in the United States, what did not change was consumers' book preferences. Print has always been the most popular book format in the U.S. and figures on increased media consumption during the pandemic showed that even Gen Z, a generation famed for loving digital, were the most likely to be reading books more than usual during the outbreak.

    Book consumption in the U.S.

    Whilst printed newspapers and magazines have struggled to survive as digital formats grow ever more prevalent and appealing, when it comes to books U.S. consumers still have a clear preference for print. Annual survey data consistently shows that U.S. adults are far more likely to have read a print book in the last year than a digital version thereof, and whilst the popularity of digital books has increased, print remains the favorite.

    As far as book buying goes, whilst the number of print books sold in the U.S. fluctuates each year, the figures remain relatively stable. Although unit sales have not surpassed 700 million since 2010, the number came close in 2018 and yearly sales from 2015 to 2019 were higher than the amount recorded in 2004.

  14. o

    Books, Minds, and Bodies dataset

    • ora.ox.ac.uk
    Updated Jan 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Troscianko, E; Carney, J; Holman, E (2022). Books, Minds, and Bodies dataset [Dataset]. http://doi.org/10.5287/bodleian:gJZz9KDE0
    Explore at:
    (10133), (124412), (10276), (41302)Available download formats
    Dataset updated
    Jan 1, 2022
    Dataset provided by
    University of Oxford
    Authors
    Troscianko, E; Carney, J; Holman, E
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    These data were gathered during the Books, Minds, and Bodies research project in 2015-16. The project was designed to investigate the therapeutic potential of shared reading, and involved running 2 reading groups over 2 consecutive terms and recording participants' discussions of the texts being read aloud together. These recordings were subsequently transcribed and used for analysis of emotional variance and linguistic similarity.

    Consistent with the ethical approval granted for the study, word order in the transcripts has been randomized so as to preclude any personal data being disclosed. This was done by tokenizing the text of each transcript into grammatical and lexical units (i.e. punctuation signs and words). These were shuffled using the "Random" module in the Python programming language, which provides a range of mathematical operations for collections of discrete objects. Nevertheless, grouping variables were preserved at the level of group (MT and HT terms) and session ID. As the calculation of values for emotional variance (on the dimensions of valence, arousal, and dominance) does not require syntax to be preserved, randomizing the data in this way should not affect the future calculation of word norm values.

    The dataset also includes text/discussion similarity calculations, qualitative coding results, and participants' post-participation feedback data.

    NB: this dataset replaces 'Books, Minds, and Bodies: raw transcript text plus VAD values' at https://ora.ox.ac.uk/objects/uuid:c370b75b-d37e-41be-89bb-cbb67a0c8614

  15. R

    Data from: Arabic Books Dataset

    • universe.roboflow.com
    zip
    Updated Aug 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Moayad alghamdi (2024). Arabic Books Dataset [Dataset]. https://universe.roboflow.com/moayad-alghamdi/arabic-books
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 12, 2024
    Dataset authored and provided by
    Moayad alghamdi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Paragraphs Bounding Boxes
    Description

    Don't worry about it, This is professionals work right here!

  16. h

    goodreads-book-descriptions

    • huggingface.co
    Updated Jun 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Book Souls (2024). goodreads-book-descriptions [Dataset]. https://huggingface.co/datasets/booksouls/goodreads-book-descriptions
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 23, 2024
    Authors
    Book Souls
    Description

    Goodreads Book Descriptions

    A dataset of English book titles and descriptions from Goodreads. The original dataset has 2.3 million books total with many more fields. There may exist a small number of non-English books in this dataset.

      Citations
    

    Mengting Wan, Julian McAuley, "Item Recommendation on Monotonic Behavior Chains", in RecSys'18. Mengting Wan, Rishabh Misra, Ndapa Nakashole, Julian McAuley, "Fine-Grained Spoiler Detection from Large-Scale Review Corpora", in… See the full description on the dataset page: https://huggingface.co/datasets/booksouls/goodreads-book-descriptions.

  17. w

    Dataset of books published by Houghton Mifflin

    • workwithdata.com
    Updated Apr 17, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2025). Dataset of books published by Houghton Mifflin [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book_publisher&fop0=%3D&fval0=Houghton+Mifflin
    Explore at:
    Dataset updated
    Apr 17, 2025
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about books. It has 534 rows and is filtered where the book publisher is Houghton Mifflin. It features 7 columns including author, publication date, language, and book publisher.

  18. Number of books read yearly by U.S. consumers 2021, by gender

    • statista.com
    Updated Jul 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Number of books read yearly by U.S. consumers 2021, by gender [Dataset]. https://www.statista.com/statistics/896508/number-of-books-consumers-read-per-year-by-gender/
    Explore at:
    Dataset updated
    Jul 11, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Dec 1, 2021 - Dec 16, 2021
    Area covered
    United States
    Description

    As of December 2021, just ** percent of surveyed women in the United States said that they had not read any books in the last year, ten percent less than the share of men who said the same. Both male and female respondents were most likely to have read *** to **** books in the year leading to the survey, though **** percent of women reported having read more than ** books in that time.

  19. o

    Primary Schools Text Books - Dataset - openAFRICA

    • open.africa
    Updated Nov 10, 2015
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2015). Primary Schools Text Books - Dataset - openAFRICA [Dataset]. https://open.africa/dataset/table-51-primary-schools-text-books
    Explore at:
    Dataset updated
    Nov 10, 2015
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The Ministry of Educations' - Basic Education Statistical Booklet captures national statistics for the Education Sector in totality. This dataset highlights the number of primary school textbooks per subject in each and every county Source: Table 51- Primary Schools Text Books

  20. w

    Dataset of books published by Bantam Books

    • workwithdata.com
    Updated Apr 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2025). Dataset of books published by Bantam Books [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book_publisher&fop0=%3D&fval0=Bantam+Books
    Explore at:
    Dataset updated
    Apr 17, 2025
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about books. It has 377 rows and is filtered where the book publisher is Bantam Books. It features 7 columns including author, publication date, language, and book publisher.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Lorena Casanova Lozano; Sergio Costa Planells; Lorena Casanova Lozano; Sergio Costa Planells (2020). Best Books Ever Dataset [Dataset]. http://doi.org/10.5281/zenodo.4265096
Organization logo

Best Books Ever Dataset

Explore at:
3 scholarly articles cite this dataset (View in Google Scholar)
csvAvailable download formats
Dataset updated
Nov 10, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Lorena Casanova Lozano; Sergio Costa Planells; Lorena Casanova Lozano; Sergio Costa Planells
License

Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically

Description

The dataset has been collected in the frame of the Prac1 of the subject Tipology and Data Life Cycle of the Master's Degree in Data Science of the Universitat Oberta de Catalunya (UOC).

The dataset contains 25 variables and 52478 records corresponding to books on the GoodReads Best Books Ever list (the larges list on the site).

Original code used to retrieve the dataset can be found on github repository: github.com/scostap/goodreads_bbe_dataset

The data was retrieved in two sets, the first 30000 books and then the remainig 22478. Dates were not parsed and reformated on the second chunk so publishDate and firstPublishDate are representet in a mm/dd/yyyy format for the first 30000 records and Month Day Year for the rest.

Book cover images can be optionally downloaded from the url in the 'coverImg' field. Python code for doing so and an example can be found on the github repo.

The 25 fields of the dataset are:

| Attributes | Definition | Completeness |
| ------------- | ------------- | ------------- | 
| bookId | Book Identifier as in goodreads.com | 100 |
| title | Book title | 100 |
| series | Series Name | 45 |
| author | Book's Author | 100 |
| rating | Global goodreads rating | 100 |
| description | Book's description | 97 |
| language | Book's language | 93 |
| isbn | Book's ISBN | 92 |
| genres | Book's genres | 91 |
| characters | Main characters | 26 |
| bookFormat | Type of binding | 97 |
| edition | Type of edition (ex. Anniversary Edition) | 9 |
| pages | Number of pages | 96 |
| publisher | Editorial | 93 |
| publishDate | publication date | 98 |
| firstPublishDate | Publication date of first edition | 59 |
| awards | List of awards | 20 |
| numRatings | Number of total ratings | 100 |
| ratingsByStars | Number of ratings by stars | 97 |
| likedPercent | Derived field, percent of ratings over 2 starts (as in GoodReads) | 99 |
| setting | Story setting | 22 |
| coverImg | URL to cover image | 99 |
| bbeScore | Score in Best Books Ever list | 100 |
| bbeVotes | Number of votes in Best Books Ever list | 100 |
| price | Book's price (extracted from Iberlibro) | 73 |

Search
Clear search
Close search
Google apps
Main menu