100+ datasets found
  1. Best Books Ever Dataset

    • zenodo.org
    csv
    Updated Nov 10, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lorena Casanova Lozano; Sergio Costa Planells; Lorena Casanova Lozano; Sergio Costa Planells (2020). Best Books Ever Dataset [Dataset]. http://doi.org/10.5281/zenodo.4265096
    Explore at:
    csvAvailable download formats
    Dataset updated
    Nov 10, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Lorena Casanova Lozano; Sergio Costa Planells; Lorena Casanova Lozano; Sergio Costa Planells
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    The dataset has been collected in the frame of the Prac1 of the subject Tipology and Data Life Cycle of the Master's Degree in Data Science of the Universitat Oberta de Catalunya (UOC).

    The dataset contains 25 variables and 52478 records corresponding to books on the GoodReads Best Books Ever list (the larges list on the site).

    Original code used to retrieve the dataset can be found on github repository: github.com/scostap/goodreads_bbe_dataset

    The data was retrieved in two sets, the first 30000 books and then the remainig 22478. Dates were not parsed and reformated on the second chunk so publishDate and firstPublishDate are representet in a mm/dd/yyyy format for the first 30000 records and Month Day Year for the rest.

    Book cover images can be optionally downloaded from the url in the 'coverImg' field. Python code for doing so and an example can be found on the github repo.

    The 25 fields of the dataset are:

    | Attributes | Definition | Completeness |
    | ------------- | ------------- | ------------- | 
    | bookId | Book Identifier as in goodreads.com | 100 |
    | title | Book title | 100 |
    | series | Series Name | 45 |
    | author | Book's Author | 100 |
    | rating | Global goodreads rating | 100 |
    | description | Book's description | 97 |
    | language | Book's language | 93 |
    | isbn | Book's ISBN | 92 |
    | genres | Book's genres | 91 |
    | characters | Main characters | 26 |
    | bookFormat | Type of binding | 97 |
    | edition | Type of edition (ex. Anniversary Edition) | 9 |
    | pages | Number of pages | 96 |
    | publisher | Editorial | 93 |
    | publishDate | publication date | 98 |
    | firstPublishDate | Publication date of first edition | 59 |
    | awards | List of awards | 20 |
    | numRatings | Number of total ratings | 100 |
    | ratingsByStars | Number of ratings by stars | 97 |
    | likedPercent | Derived field, percent of ratings over 2 starts (as in GoodReads) | 99 |
    | setting | Story setting | 22 |
    | coverImg | URL to cover image | 99 |
    | bbeScore | Score in Best Books Ever list | 100 |
    | bbeVotes | Number of votes in Best Books Ever list | 100 |
    | price | Book's price (extracted from Iberlibro) | 73 |

  2. h

    institutional-books-1.0

    • huggingface.co
    Updated Jun 11, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Institutional Data Initiative (2025). institutional-books-1.0 [Dataset]. https://huggingface.co/datasets/institutional/institutional-books-1.0
    Explore at:
    Dataset updated
    Jun 11, 2025
    Dataset authored and provided by
    Institutional Data Initiative
    Description

    📚 Institutional Books 1.0

    Institutional Books is a growing corpus of public domain books. This 1.0 release is comprised of 983,004 public domain books digitized as part of Harvard Library's participation in the Google Books project and refined by the Institutional Data Initiative. Use of this data is governed by the IDI Terms of Use for Early-Access.

    983K books, published largely in the 19th and 20th centuries 242B o200k_base tokens 386M pages of text, available in both original… See the full description on the dataset page: https://huggingface.co/datasets/institutional/institutional-books-1.0.

  3. Goodreads-books-with-genres

    • kaggle.com
    Updated Dec 30, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MiddleLight (2022). Goodreads-books-with-genres [Dataset]. https://www.kaggle.com/datasets/middlelight/goodreadsbookswithgenres
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 30, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    MiddleLight
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    based on the dataset goodreadsbooks by jealousleopard. used this dataset and simply added genre information to it. I added genres to all but 97 books. note that I upload this dataset in 2022 but it is based on a dataset which has probably not be updated for a couple of years.

    to add the genres I used PaulKlinger "Enhance-GoodReads-Export" script with a few bug fixes, and enabled searching goodreads.com/en for data the script did not find under goodreads.com . this covered all books except about 400, 300 of which I extracted the genre data using a call to 'https://www.googleapis.com/books/v1/volumes?q=isbn:' the remaining 97 I leave to the kind-hearted to fill.

  4. h

    books-ratings

    • huggingface.co
    Updated Feb 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rootstrap (2024). books-ratings [Dataset]. https://huggingface.co/datasets/rootstrap-org/books-ratings
    Explore at:
    Dataset updated
    Feb 7, 2024
    Dataset provided by
    Rootstrap, Inc.
    Authors
    Rootstrap
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Book ratings

    This dataset has two files:

    Books_rating.csv --> With information about books ratings made by users books_data.csv --> Metadata about the books, title, author, genre, etc.

    It is intended as an input dataset to train a recommender system. It was obtained from this dataset of Amazon book reviews in Kaggle

  5. Books Dataset

    • figshare.com
    txt
    Updated Jan 19, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Giuseppe Mendola (2016). Books Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.1441255.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jan 19, 2016
    Dataset provided by
    figshare
    Authors
    Giuseppe Mendola
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This database contains information about books gathered with help of Google Books API. The database contains 7 different tables where 3 of them are only to relate the other tables together. Tables: Books contains 1062 records. Authors contains 1595 records. Categories 109 records. Metadata 37 records. MD5 (GBooks_2015-06-09.sql) = bfd09094d0e123e668b2e58332b1a98b

  6. h

    Data from: arabic-books

    • huggingface.co
    Updated Nov 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohamed Rashad (2024). arabic-books [Dataset]. https://huggingface.co/datasets/MohamedRashad/arabic-books
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 28, 2024
    Authors
    Mohamed Rashad
    License

    https://choosealicense.com/licenses/gpl-3.0/https://choosealicense.com/licenses/gpl-3.0/

    Description

    Arabic Books

      Dataset Summary
    

    The arabic-books dataset contains 8,500 rows of text, each representing the full text of a single Arabic book. These texts were extracted using the arabic-large-nougat model, showcasing the model’s capabilities in Arabic OCR and text extraction. The dataset spans a total of 1.1 billion tokens, calculated using the GPT-4 tokenizer. This dataset is a testimony to the quality of the Arabic Nougat models and their effectiveness in extracting… See the full description on the dataset page: https://huggingface.co/datasets/MohamedRashad/arabic-books.

  7. Books_Dataset_text_generation

    • kaggle.com
    Updated Oct 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prashant Karwasra (2023). Books_Dataset_text_generation [Dataset]. https://www.kaggle.com/datasets/prashantkarwasra/books-dataset-text-generation
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 19, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Prashant Karwasra
    Description

    Dataset: "Harry Potter" and "The Lord of the Rings" Text Corpus

    This curated dataset on Kaggle comprises the entire collection of texts from the "Harry Potter" series by J.K. Rowling and "The Lord of the Rings" series, including "The Hobbit" and "The Silmarillion" by J.R.R. Tolkien. It serves as a comprehensive text corpus for training language models, particularly suited for text generation and natural language processing tasks.

    About the Dataset:

    Content: The dataset includes the complete text corpus from the "Harry Potter" series (seven books) and "The Lord of the Rings" trilogy (including "The Hobbit" and "The Silmarillion"). These iconic fantasy novels feature rich narratives, diverse characters, and intricate world-building. Format: The texts are provided in plain text format, divided into separate files or sections corresponding to each book/chapter.

    Usage:

    The combined collection of "Harry Potter" and "The Lord of the Rings" texts within this dataset offers a wealth of literary content for training language models, particularly for tasks like text generation, language modeling, sentiment analysis, and more. The dataset's diverse narratives and unique writing styles contribute to building robust and contextually aware language models.

  8. w

    Dataset of author, book publisher and ISBN of books

    • workwithdata.com
    Updated Apr 17, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2025). Dataset of author, book publisher and ISBN of books [Dataset]. https://www.workwithdata.com/datasets/books?col=author%2Cbook%2Cbook_publisher%2Cisbn
    Explore at:
    Dataset updated
    Apr 17, 2025
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about books. It has 2,617,384 rows. It features 4 columns: author, book publisher, and ISBN. It is 97% filled with non-null values.

  9. y

    Books Borrowed - E-Library

    • data.yorkopendata.org
    Updated Feb 11, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2016). Books Borrowed - E-Library [Dataset]. https://data.yorkopendata.org/dataset/kpi-lib02-r
    Explore at:
    Dataset updated
    Feb 11, 2016
    License

    Open Government Licence 2.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/2/
    License information was derived automatically

    Description

    Books Borrowed - E-Library Resources CSV Performance Indicator : LIB02-R Books Borrowed - E-Library Explore Preview Download

  10. Book consumption in the U.S. 2011-2021, by format

    • statista.com
    Updated Jun 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Book consumption in the U.S. 2011-2021, by format [Dataset]. https://www.statista.com/statistics/222754/book-format-used-by-readers-in-the-us/
    Explore at:
    Dataset updated
    Jun 24, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    United States
    Description

    Reading books remains a popular pastime for U.S. adults, with ** percent of respondents to a 2021 survey saying that they had read a book in any format within the last year. Despite online media formats now being the preferred option for many consumers when it comes to television, music, and gaming, print books are by far the most popular format among readers in the United States. Whilst almost double the share of adults now read audiobooks compared to 2011, only ** percent claimed to have read an audiobook in the last year compared to ** percent who said that they had read a print book. Book sales in the United States In 2020, bookstore sales in the United States amounted to **** billion U.S. dollars. Sales in 2019 and 2020 were the lowest recorded since the early *****, and the combined effect of the coronavirus outbreak, along with the growing appeal of online purchasing, will likely mean that bookstore sales will continue to drop. Bookstores tend to see most success in August, December, and January, and sales revenue often surpasses *********** U.S. dollars in those months each year. That said, monthly retail sales of bookstores in the U.S. are notably lower overall than in previous years and were particularly poor in spring 2020 as a result of national shutdowns to stem the spread of COVID-19. Influence of COVID-19 on reading habits The coronavirus pandemic led to increased media consumption in general, but not only among avid video and music streaming fans. Data from a survey in March 2020 revealed that ** percent of Millennials read more books due to the COVID-19 outbreak, making consumers in this group the most likely to have done so compared to ** percent of the total survey sample. Meanwhile, ** percent of Boomers said that their reading habits had not changed.

  11. Book-Crossing Dataset

    • kaggle.com
    zip
    Updated Sep 7, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    somnambWl (2019). Book-Crossing Dataset [Dataset]. https://www.kaggle.com/datasets/somnambwl/bookcrossing-dataset
    Explore at:
    zip(17632108 bytes)Available download formats
    Dataset updated
    Sep 7, 2019
    Authors
    somnambWl
    Description

    Book-Crossing dataset mined by Cai-Nicolas Ziegler

    Freely available for research use when acknowledged with the following reference (further details on the dataset are given in this publication):

    • PDF

    • Cai-Nicolas Ziegler, Sean M. McNee, Joseph A. Konstan, Georg Lausen; Proceedings of the 14th International World Wide Web Conference (WWW '05), May 10-14, 2005, Chiba, Japan. To appear.

    Further information and the original dataset can be found at the original webpage.

    Changes to the dataset:

    • Location removed as it comes in different formats not in default (city, state, country).
    • Transferred from ISO-8859-1 to UTF-8
    • Manually fixed a few rows with incorrect number of columns

    Note:

    • out of 278859 users:
      • only 99053 rated at least 1 book
      • only 43385 rated at least 2 books.
      • only 12306 rated at least 10 books.
    • out of 271379 books:
      • only 270171 are rated at least once.
      • only 124513 have at least 2 ratings.
      • only 17480 have at least 10 ratings.
  12. R

    Data from: Damaged Books Dataset

    • universe.roboflow.com
    zip
    Updated Oct 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    atcom21@gmail.com (2023). Damaged Books Dataset [Dataset]. https://universe.roboflow.com/atcom21-gmail-com/damaged-books/dataset/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 15, 2023
    Dataset authored and provided by
    atcom21@gmail.com
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Damages Bounding Boxes
    Description

    Damaged Books

    ## Overview
    
    Damaged Books is a dataset for object detection tasks - it contains Damages annotations for 620 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  13. w

    Dataset of books published by The authors

    • workwithdata.com
    Updated Apr 17, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2025). Dataset of books published by The authors [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book_publisher&fop0=%3D&fval0=The+authors
    Explore at:
    Dataset updated
    Apr 17, 2025
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about books. It has 132 rows and is filtered where the book publisher is The authors. It features 7 columns including author, publication date, language, and book publisher.

  14. R

    Data from: All Books Dataset

    • universe.roboflow.com
    zip
    Updated Aug 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zebra Learn (2023). All Books Dataset [Dataset]. https://universe.roboflow.com/zebra-learn/all-books-mumha/dataset/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 4, 2023
    Dataset authored and provided by
    Zebra Learn
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Book Bounding Boxes
    Description

    All Books

    ## Overview
    
    All Books is a dataset for object detection tasks - it contains Book annotations for 2,070 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  15. R

    Books Dataset

    • universe.roboflow.com
    zip
    Updated Dec 16, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    YIL (2024). Books Dataset [Dataset]. https://universe.roboflow.com/yil/books-vty7g/model/2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 16, 2024
    Dataset authored and provided by
    YIL
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Novel Scientific 12 16 Bounding Boxes
    Description

    Books

    ## Overview
    
    Books is a dataset for object detection tasks - it contains Novel Scientific 12 16 annotations for 884 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  16. h

    fiction-books

    • huggingface.co
    Updated Feb 15, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aleksey Korshuk (2024). fiction-books [Dataset]. https://huggingface.co/datasets/AlekseyKorshuk/fiction-books
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 15, 2024
    Authors
    Aleksey Korshuk
    Description

    AlekseyKorshuk/fiction-books dataset hosted on Hugging Face and contributed by the HF Datasets community

  17. w

    Dataset of books published by Hachette Books

    • workwithdata.com
    Updated Apr 17, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2025). Dataset of books published by Hachette Books [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book_publisher&fop0=%3D&fval0=Hachette+Books
    Explore at:
    Dataset updated
    Apr 17, 2025
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about books. It has 43 rows and is filtered where the book publisher is Hachette Books. It features 7 columns including author, publication date, language, and book publisher.

  18. t

    Image–Text Pair Dataset from Books - Dataset - LDM

    • service.tib.eu
    Updated Dec 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Image–Text Pair Dataset from Books - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/image-text-pair-dataset-from-books
    Explore at:
    Dataset updated
    Dec 16, 2024
    Description

    A dataset constructed from book images using an optical character reader (OCR), an object detector, and a layout analyzer for the autonomous extraction of image–text pairs.

  19. Preferred book formats in the U.S. 2020

    • statista.com
    Updated Mar 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Preferred book formats in the U.S. 2020 [Dataset]. https://www.statista.com/statistics/299074/book-consumption-per-capita-print-ebook-usa/
    Explore at:
    Dataset updated
    Mar 10, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Mar 28, 2020 - Apr 27, 2020
    Area covered
    United States
    Description

    According to a survey held in the United States between March and April 2020, 70 percent of respondents said that they read print books the most, with 39 percent of those consumers preferring their books to be new.

    The study was conducted as the U.S. went into lockdown to prevent the spread of the coronavirus, however although the virus certainly affected media consumption in the United States, what did not change was consumers' book preferences. Print has always been the most popular book format in the U.S. and figures on increased media consumption during the pandemic showed that even Gen Z, a generation famed for loving digital, were the most likely to be reading books more than usual during the outbreak.

    Book consumption in the U.S.

    Whilst printed newspapers and magazines have struggled to survive as digital formats grow ever more prevalent and appealing, when it comes to books U.S. consumers still have a clear preference for print. Annual survey data consistently shows that U.S. adults are far more likely to have read a print book in the last year than a digital version thereof, and whilst the popularity of digital books has increased, print remains the favorite.

    As far as book buying goes, whilst the number of print books sold in the U.S. fluctuates each year, the figures remain relatively stable. Although unit sales have not surpassed 700 million since 2010, the number came close in 2018 and yearly sales from 2015 to 2019 were higher than the amount recorded in 2004.

  20. R

    Books Cover 2 [backup] Dataset

    • universe.roboflow.com
    zip
    Updated May 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Books Cover Dataset (2024). Books Cover 2 [backup] Dataset [Dataset]. https://universe.roboflow.com/books-cover-dataset/books-cover-dataset-2-backup
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 30, 2024
    Dataset authored and provided by
    Books Cover Dataset
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Variables measured
    Books OS6z Bounding Boxes
    Description

    Books Cover Dataset 2 [Backup]

    ## Overview
    
    Books Cover Dataset 2 [Backup] is a dataset for object detection tasks - it contains Books OS6z annotations for 520 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [Public Domain license](https://creativecommons.org/licenses/Public Domain).
    
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Lorena Casanova Lozano; Sergio Costa Planells; Lorena Casanova Lozano; Sergio Costa Planells (2020). Best Books Ever Dataset [Dataset]. http://doi.org/10.5281/zenodo.4265096
Organization logo

Best Books Ever Dataset

Explore at:
5 scholarly articles cite this dataset (View in Google Scholar)
csvAvailable download formats
Dataset updated
Nov 10, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Lorena Casanova Lozano; Sergio Costa Planells; Lorena Casanova Lozano; Sergio Costa Planells
License

Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically

Description

The dataset has been collected in the frame of the Prac1 of the subject Tipology and Data Life Cycle of the Master's Degree in Data Science of the Universitat Oberta de Catalunya (UOC).

The dataset contains 25 variables and 52478 records corresponding to books on the GoodReads Best Books Ever list (the larges list on the site).

Original code used to retrieve the dataset can be found on github repository: github.com/scostap/goodreads_bbe_dataset

The data was retrieved in two sets, the first 30000 books and then the remainig 22478. Dates were not parsed and reformated on the second chunk so publishDate and firstPublishDate are representet in a mm/dd/yyyy format for the first 30000 records and Month Day Year for the rest.

Book cover images can be optionally downloaded from the url in the 'coverImg' field. Python code for doing so and an example can be found on the github repo.

The 25 fields of the dataset are:

| Attributes | Definition | Completeness |
| ------------- | ------------- | ------------- | 
| bookId | Book Identifier as in goodreads.com | 100 |
| title | Book title | 100 |
| series | Series Name | 45 |
| author | Book's Author | 100 |
| rating | Global goodreads rating | 100 |
| description | Book's description | 97 |
| language | Book's language | 93 |
| isbn | Book's ISBN | 92 |
| genres | Book's genres | 91 |
| characters | Main characters | 26 |
| bookFormat | Type of binding | 97 |
| edition | Type of edition (ex. Anniversary Edition) | 9 |
| pages | Number of pages | 96 |
| publisher | Editorial | 93 |
| publishDate | publication date | 98 |
| firstPublishDate | Publication date of first edition | 59 |
| awards | List of awards | 20 |
| numRatings | Number of total ratings | 100 |
| ratingsByStars | Number of ratings by stars | 97 |
| likedPercent | Derived field, percent of ratings over 2 starts (as in GoodReads) | 99 |
| setting | Story setting | 22 |
| coverImg | URL to cover image | 99 |
| bbeScore | Score in Best Books Ever list | 100 |
| bbeVotes | Number of votes in Best Books Ever list | 100 |
| price | Book's price (extracted from Iberlibro) | 73 |

Search
Clear search
Close search
Google apps
Main menu