100+ datasets found

Best Books Ever Dataset
zenodo.org
csv
Updated Nov 10, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lorena Casanova Lozano; Sergio Costa Planells; Lorena Casanova Lozano; Sergio Costa Planells (2020). Best Books Ever Dataset [Dataset]. http://doi.org/10.5281/zenodo.4265096
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4265096
Dataset updated
Nov 10, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Lorena Casanova Lozano; Sergio Costa Planells; Lorena Casanova Lozano; Sergio Costa Planells
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
The dataset has been collected in the frame of the Prac1 of the subject Tipology and Data Life Cycle of the Master's Degree in Data Science of the Universitat Oberta de Catalunya (UOC).

The dataset contains 25 variables and 52478 records corresponding to books on the GoodReads Best Books Ever list (the larges list on the site).

Original code used to retrieve the dataset can be found on github repository: github.com/scostap/goodreads_bbe_dataset

The data was retrieved in two sets, the first 30000 books and then the remainig 22478. Dates were not parsed and reformated on the second chunk so publishDate and firstPublishDate are representet in a mm/dd/yyyy format for the first 30000 records and Month Day Year for the rest.

Book cover images can be optionally downloaded from the url in the 'coverImg' field. Python code for doing so and an example can be found on the github repo.

The 25 fields of the dataset are:

| Attributes | Definition | Completeness | | ------------- | ------------- | ------------- | | bookId | Book Identifier as in goodreads.com | 100 | | title | Book title | 100 | | series | Series Name | 45 | | author | Book's Author | 100 | | rating | Global goodreads rating | 100 | | description | Book's description | 97 | | language | Book's language | 93 | | isbn | Book's ISBN | 92 | | genres | Book's genres | 91 | | characters | Main characters | 26 | | bookFormat | Type of binding | 97 | | edition | Type of edition (ex. Anniversary Edition) | 9 | | pages | Number of pages | 96 | | publisher | Editorial | 93 | | publishDate | publication date | 98 | | firstPublishDate | Publication date of first edition | 59 | | awards | List of awards | 20 | | numRatings | Number of total ratings | 100 | | ratingsByStars | Number of ratings by stars | 97 | | likedPercent | Derived field, percent of ratings over 2 starts (as in GoodReads) | 99 | | setting | Story setting | 22 | | coverImg | URL to cover image | 99 | | bbeScore | Score in Best Books Ever list | 100 | | bbeVotes | Number of votes in Best Books Ever list | 100 | | price | Book's price (extracted from Iberlibro) | 73 |
Books Dataset
figshare.com
txt
Updated Jan 19, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Giuseppe Mendola (2016). Books Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.1441255.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.1441255.v1
Dataset updated
Jan 19, 2016
Dataset provided by
Figsharehttp://figshare.com/
Authors
Giuseppe Mendola
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This database contains information about books gathered with help of Google Books API. The database contains 7 different tables where 3 of them are only to relate the other tables together. Tables: Books contains 1062 records. Authors contains 1595 records. Categories 109 records. Metadata 37 records. MD5 (GBooks_2015-06-09.sql) = bfd09094d0e123e668b2e58332b1a98b
General Record of Incidence of Mortality (GRIM) books
data.gov.au
researchdata.edu.au
+2more
csv
Updated Apr 14, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Australian Institute of Health and Welfare (2025). General Record of Incidence of Mortality (GRIM) books [Dataset]. https://data.gov.au/data/dataset/grim-books
Explore at:
csv(25197618)Available download formats
Dataset updated
Apr 14, 2025
Dataset authored and provided by
Australian Institute of Health and Welfarehttp://www.aihw.gov.au/
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Description
Extracted in machine readable form from the AIHW General Record of Incidence of Mortality (GRIM) books.

GRIM books are Excel workbooks that contain national level, historical and recent deaths data for specific causes of death. The tables present age- and sex-specific counts and rates by cause of death, along with other summary measures.

GRIM books are available for all causes of death combined and 55 other cause of death groupings. They span different years for different causes of death, depending on the data available. GRIM books for some causes of death start at 1907 and they are the only national electronic tabulations of deaths data by cause registered before 1964. Data from 1964 onwards are sourced from the AIHW National Mortality Database. They include mortality data up to 2023.

For more information, please see Deaths data at AIHW or contact us at deaths@aihw.gov.au.

Also available on data.gov.au are the AIHW Mortality Over Regions and Time (MORT) books.
Data from: NIST/ARPA-E Database of Novel and Emerging Adsorbent Materials
catalog.data.gov
datasets.ai
+3more
Updated Jul 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Standards and Technology (2022). NIST/ARPA-E Database of Novel and Emerging Adsorbent Materials [Dataset]. https://catalog.data.gov/dataset/nist-arpa-e-database-of-novel-and-emerging-adsorbent-materials-ad6ac
Explore at:
Dataset updated
Jul 29, 2022
Dataset provided by
National Institute of Standards and Technologyhttp://www.nist.gov/
Description
The NIST/ARPA-E Database of Novel and Emerging Adsorbent Materials is a free, web-based catalog of adsorbent materials and measured adsorption properties of numerous materials obtained from article entries from the scientific literature. Search fields for the database include adsorbent material, adsorbate gas, experimental conditions (pressure, temperature), and bibliographic information (author, title, journal), and results from queries are provided as a list of articles matching the search parameters. The database also contains adsorption isotherms digitized from the cataloged articles, which can be compared visually online in the web application or exported for offline analysis.
d
Japanese Novel Data
search.dataone.org
dataverse.harvard.edu
Updated Nov 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Budwell, Jackson (2023). Japanese Novel Data [Dataset]. http://doi.org/10.7910/DVN/21YHPO
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/21YHPO
Dataset updated
Nov 8, 2023
Dataset provided by
Harvard Dataverse
Authors
Budwell, Jackson
Description
A dataset on Japanese novels. This dataset contains 1113 novels. This dataset contains the following variables: Title, words, unique words, unique words used once, UWUU%, kanji, kanji used once, kanji readings, difficulty, average sentence length, characters, publisher, pages, ASIN, and the Japanese title. Note: not all observations have complete publisher, page, ASIN, and Japanese title data. The variables title, words, unique words, unique words used once, UWUU%, kanji, kanji used once, kanji readings, difficulty, average sentence length, and characters were obtained from Jpdb.io. Publisher, pages, ASIN, and the Japanese title were obtained from Amazon.co.jp. The dataset was mined using python and BS4 by CDT Budwell. This dataset was created in support of MA206 (Intro to Statistics) at USMA West Point by CDT Jackson Budwell '25.
E
Data from: 1000PLUS Novels Corpus (1.0)
live.european-language-grid.eu
binary format
Updated Jul 18, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2019). 1000PLUS Novels Corpus (1.0) [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/8654
Explore at:
binary formatAvailable download formats
Dataset updated
Jul 18, 2019
License
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Description
Corpus of literary texts intended as benchmark collection for text categorization. It contains 1000 novels written in polish or translated to polish by various authors. This is an extension of 1000 Novels Corpus (http://hdl.handle.net/11321/312). Each text is stored as separate .txt file and .cmdi metadata description.
w
Dataset of books about Database design
workwithdata.com
Updated Apr 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2025). Dataset of books about Database design [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=j0-book_subject&fop0=%3D&fval0=Database+design&j=1&j0=book_subjects
Explore at:
Dataset updated
Apr 17, 2025
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about books. It has 321 rows and is filtered where the book subjects is Database design. It features 9 columns including author, publication date, language, and book publisher.
w
Dataset of books series that contain Logical database design principles
workwithdata.com
Updated Nov 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2024). Dataset of books series that contain Logical database design principles [Dataset]. https://www.workwithdata.com/datasets/book-series?f=1&fcol0=j0-book&fop0=%3D&fval0=Logical+database+design+principles&j=1&j0=books
Explore at:
Dataset updated
Nov 25, 2024
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about book series. It has 2 rows and is filtered where the books is Logical database design principles. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.
txtLAB Contemporary Novel Data Set
figshare.com
txt
Updated Jan 19, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrew Piper (2016). txtLAB Contemporary Novel Data Set [Dataset]. http://doi.org/10.6084/m9.figshare.2061990.v3
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.2061990.v3
Dataset updated
Jan 19, 2016
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Andrew Piper
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This table contains a list of ca 1200 novels that represent different genres of contemporary writing. It is discussed in "How Cultural Capital Works" Post-45 (2016): http://post45.research.yale.edu
u
The Novel as Global Form Project: Critical Reception and Circulation Data
recerca.uoc.edu
dataverse.csuc.cat
+1more
Updated 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bellido, Aitana; Ikoff, Ventsislav; Sangrà Bruguera, Marta; Puxan-Oliva, Marta; Rotger, Neus; Bellido, Aitana; Ikoff, Ventsislav; Sangrà Bruguera, Marta; Puxan-Oliva, Marta; Rotger, Neus (2024). The Novel as Global Form Project: Critical Reception and Circulation Data [Dataset]. https://recerca.uoc.edu/documentos/67a9c7bb19544708f8c70c3f
Explore at:
Dataset updated
2024
Authors
Bellido, Aitana; Ikoff, Ventsislav; Sangrà Bruguera, Marta; Puxan-Oliva, Marta; Rotger, Neus; Bellido, Aitana; Ikoff, Ventsislav; Sangrà Bruguera, Marta; Puxan-Oliva, Marta; Rotger, Neus
Description
The present dataset traces the critical reception and international circulation of a selection of contemporary novels published between years 1989 and 2021 that can be productively read through the lens of the “global novel” debate. The archive retrieves more than 1500 data entries concerning the translation, circulation and consecration of our corpus novels. This includes all original and translated editions, literary reviews and interviews with the authors in different countries, specialized academic works, awarded or short-listed literary prizes, film and theater adaptations, as well as metadata from relevant literary agents involved in their international circulation–from translators to literary agents, foreign rights agents, and publishers. These data have served as the basis for the qualitative case studies carried out within the framework of the project “The Novel as Global Form. Poetic Challenges and Cross-border Literary Circulation” (Spanish Research Agency, PID2020-118610GA-I00) of the Universitat Oberta de Catalunya (UOC) in Barcelona, Spain.

Compiled data respond to the need to answer project questions such as “How globally do ‘global authors’ actually circulate?”, “What paths of circulation and recognition emerge when we consider non-Anglophone authors?”“, and “What is the role of gender in the writing and editorial process?”. In this case, data includes parameters that respond to the project research questions and might be uneven in relation to other matters, especially regarding geographical or linguistic representativeness, which was not the project’s focus.

The project's selected novels are the following: Norwegian Gert Nygårdshaug’s Mengele Zoo (1989); Georgian Aka Morchiladze’s მოგზაურობა ყარაბაღში (Journey to Karabakh, 1992); Colombian Juan Gabriel Vásquez’s Historia secreta de Costaguana (The Secret History of Costaguana, 2007); Polish Olga Tokarczuk’s Bieguni (Flights, 2008); Brazilian Patricia Melo’s O Ladrão de Cadáveres (The Body Snatcher, 2010); Argentinian Ariana Harwicz’s Mátate, amor (Die, My Love, 2012); South African and Australian J. M. Coetzee’s Jesus trilogy (The Childhood of Jesus, 2013, The Schooldays of Jesus, 2016, and The Death of Jesus, 2019); Nino Haratischwili’s Das achte Leben (für Brilka) (The Eight Life (for Brilka), 2014); Brazilian Carla Madeira‘s Tudo é rio (‘Everything is Rio’, 2014); Argentinian Samanta Schweblin’s Distancia de rescate (Fever Dream, 2014); Italian Bruno Arpaia’s Qualcosa, là fuori (‘Something, Out There’, 2016); French Élisabeth Filhol’s Doggerland (2019); Lebanese Zena El Khalil’s Beirut, I Love You (2019); Catalan Irene Solà’s Canto jo i la muntanya balla (When I Sing, Mountains Dance, 2019), and Turkish-British Elif Shafak’s The Island of Missing Trees (2021).
h
Psy-Data-books
huggingface.co
Updated Jun 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ammar (2025). Psy-Data-books [Dataset]. https://huggingface.co/datasets/Daemontatox/Psy-Data-books
Explore at:
Dataset updated
Jun 19, 2025
Authors
Ammar
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
🧠 Psy-Data-Books: Synthetic Medical & Psychology Conversation Dataset

Psy-Data-Books is one of the largest synthetic datasets of psychology and medical conversations, generated from verified medical and psychology literature. It is designed for building and training powerful conversational AI systems for healthcare, therapy, and mental health applications.

📊 Dataset Summary

Domain: Psychology, Psychiatry, Mental Health, General Medicine Data Type: Synthetic… See the full description on the dataset page: https://huggingface.co/datasets/Daemontatox/Psy-Data-books.
Global import data of Books
volza.com
csv
Updated Jun 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Volza FZ LLC (2025). Global import data of Books [Dataset]. https://www.volza.com/p/books/import/import-in-united-states/
Explore at:
csvAvailable download formats
Dataset updated
Jun 24, 2025
Dataset provided by
Volza
Authors
Volza FZ LLC
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Count of importers, Sum of import value, 2014-01-01/2021-09-30, Count of import shipments
Description
459554 Global import shipment records of Books with prices, volume & current Buyer's suppliers relationships based on actual Global export trade database.
n
Data from: ARAMEMNON, a Novel Database for Arabidopsis Integral Membrane...
neuinfo.org
scicrunch.org
+3more
Updated Jan 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). ARAMEMNON, a Novel Database for Arabidopsis Integral Membrane Proteins [Dataset]. http://identifiers.org/RRID:SCR_007552/resolver?q=&i=rrid
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_007552 https://identifiers.org/RRID:SCR_007552/resolver?q=&i=rrid
Dataset updated
Jan 29, 2022
Description
A database of putative membrane proteins of Thale Cress (Arabidopsis thaliana), Rice (Oryza sativa) and about some 6700 putative membrane proteins of ~300 other seed plants. The database stores data about: * protein, cDNA and genomic sequences * exon predictions (A.thaliana and O.sativa) * different cDNA/protein models of genes (A.thaliana and O.sativa) * ontology terms according to the Gene Ontology (GO) Consortium * protein sequence motifs as predictable by using the PFAM database * transporter classification as predictable by using the TC-system * bibliographic references * predictions for transmembrane spanning proteins (transmembrane alpha helices, beta barrels) * predictions for membrane-anchored proteins (GPI-attachment, prenylation, myristoylation) * prediction of the subcellular location * consensus predictions (transmembrane alpha helices, subcellular location) * isospecic homologs (''paralogs'') * heterospecic homologs (''orthologs'')
Z
Fictions littéraires de Gallica / Literary fictions of Gallica
data.niaid.nih.gov
zenodo.org
Updated Jul 19, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Langlais, Pierre-Carl (2024). Fictions littéraires de Gallica / Literary fictions of Gallica [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4660197
Explore at:
Dataset updated
Jul 19, 2024
Dataset authored and provided by
Langlais, Pierre-Carl
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The collection "Fiction littéraire de Gallica" includes 19,240 public domain documents from the digital platform of the French National Library that were originally classified as novels or, more broadly, as literary fiction in prose. It consists of 372 tables of data in tsv format for each year of publication from 1600 to 1996 (all the missing years are in the 17th and 20th centuries). Each table is structured at the page-level of each novel (5,723,986 pages in all). It contains the complete text with the addition of some metadata. It can be opened in Excel or, preferably, with the new data analysis environments in R or Python (tidyverse, pandas…)

This corpus can be used for large-scale quantitative analyses in computational humanities. The OCR text is presented in a raw format without any correction or enrichment in order to be directly processed for text mining purposes.

The extraction is based on a historical categorization of the novels: the Y2 or Ybis classification. This classification, invented in 1730, is the only one that has been continuously applied to the BNF collections now available in the public domain (mainly before 1950). Consequently, the dataset is based on a definition of "novel" that is generally contemporary of the publication.

A French data paper (in PDF and HTML) presents the construction process of the Y2 category and describes the structuring of the corpus. It also gives several examples of possible uses for computational humanities projects.
p
Books Wholesalers in Nevada, United States - 1 Verified Listings Database
poidata.io
csv, excel, json
Updated Jul 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Poidata.io (2025). Books Wholesalers in Nevada, United States - 1 Verified Listings Database [Dataset]. https://www.poidata.io/report/books-wholesaler/united-states/nevada
Explore at:
json, csv, excelAvailable download formats
Dataset updated
Jul 1, 2025
Dataset provided by
Poidata.io
Area covered
Nevada, United States
Description
Comprehensive dataset of 1 Books wholesalers in Nevada, United States as of July, 2025. Includes verified contact information (email, phone), geocoded addresses, customer ratings, reviews, business categories, and operational details. Perfect for market research, lead generation, competitive analysis, and business intelligence. Download a complimentary sample to evaluate data quality and completeness.
n
PiCoBoo database: Aunt Mavor's Picture Books for Little Readers [Second...
data.ncl.ac.uk
pdf
Updated May 31, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Francesca Tancini (2023). PiCoBoo database: Aunt Mavor's Picture Books for Little Readers [Second Series] [Dataset]. http://doi.org/10.25405/data.ncl.15181071.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.25405/data.ncl.15181071.v1
Dataset updated
May 31, 2023
Dataset provided by
Newcastle University
Authors
Francesca Tancini
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Aunt Mavor's Picture Books for Little Readers [Second Series]
w
Dataset of books called Advanced database techniques
workwithdata.com
Updated Apr 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2025). Dataset of books called Advanced database techniques [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=Advanced+database+techniques
Explore at:
Dataset updated
Apr 17, 2025
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about books. It has 1 row and is filtered where the book is Advanced database techniques. It features 7 columns including author, publication date, language, and book publisher.
p
Books Wholesalers in Louisiana, United States - 5 Verified Listings Database...
poidata.io
csv, excel, json
Updated Jun 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Poidata.io (2025). Books Wholesalers in Louisiana, United States - 5 Verified Listings Database [Dataset]. https://www.poidata.io/report/books-wholesaler/united-states/louisiana
Explore at:
json, excel, csvAvailable download formats
Dataset updated
Jun 30, 2025
Dataset provided by
Poidata.io
Area covered
Louisiana, United States
Description
Comprehensive dataset of 5 Books wholesalers in Louisiana, United States as of June, 2025. Includes verified contact information (email, phone), geocoded addresses, customer ratings, reviews, business categories, and operational details. Perfect for market research, lead generation, competitive analysis, and business intelligence. Download a complimentary sample to evaluate data quality and completeness.
h
institutional-books-1.0
huggingface.co
Updated Jun 11, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Institutional Data Initiative (2025). institutional-books-1.0 [Dataset]. https://huggingface.co/datasets/institutional/institutional-books-1.0
Explore at:
Dataset updated
Jun 11, 2025
Dataset authored and provided by
Institutional Data Initiative
Description
📚 Institutional Books 1.0

Institutional Books is a growing corpus of public domain books. This 1.0 release is comprised of 983,004 public domain books digitized as part of Harvard Library's participation in the Google Books project and refined by the Institutional Data Initiative. Use of this data is governed by the IDI Terms of Use for Early-Access.

983K books, published largely in the 19th and 20th centuries 242B o200k_base tokens 386M pages of text, available in both original… See the full description on the dataset page: https://huggingface.co/datasets/institutional/institutional-books-1.0.
Print book unit sales in the U.S. 2004-2024
statista.com
Updated Jun 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Print book unit sales in the U.S. 2004-2024 [Dataset]. https://www.statista.com/statistics/422595/print-book-sales-usa/
Explore at:
Dataset updated
Jun 24, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
United States
Description
Data showing how many books were sold in 2024 revealed that the printed book market remains healthy: a total of ***** million units were sold that year among outlets which reported to the source. Whilst this marked a small jump from the previous year, the figure peaked in 2021 and has not surpassed *** million since. Trade paperbacks remained the dominant format. Book sales statistics Looking at book sales by year, 2005 to 2010 were the most lucrative for the printed book market, with well over *** million units sold annually during that five-year period. After dropping below *** million in 2012, gradual and consistent increases can be seen each year, with the exception of between the years 2018 and 2019. For bookstores though, how many books are sold each year depends on the success of key months across a twelve-month period. Bookstore sales in the United States are at their highest in December, January, and August, but figures for December are consistently higher than other months. Books are popular holiday gifts, with around ** to ** percent of consumers responding to annual surveys in each year from 2012 to 2020 saying that they planned to purchase books as presents during the festive season.

Facebook

Twitter

Click to copy link

Link copied

Cite

Lorena Casanova Lozano; Sergio Costa Planells; Lorena Casanova Lozano; Sergio Costa Planells (2020). Best Books Ever Dataset [Dataset]. http://doi.org/10.5281/zenodo.4265096

Best Books Ever Dataset

Explore at:

3 scholarly articles cite this dataset (View in Google Scholar)

csvAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.4265096

Dataset updated

Nov 10, 2020

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Lorena Casanova Lozano; Sergio Costa Planells; Lorena Casanova Lozano; Sergio Costa Planells

License

Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically

Description

The dataset has been collected in the frame of the Prac1 of the subject Tipology and Data Life Cycle of the Master's Degree in Data Science of the Universitat Oberta de Catalunya (UOC).

The dataset contains 25 variables and 52478 records corresponding to books on the GoodReads Best Books Ever list (the larges list on the site).

Original code used to retrieve the dataset can be found on github repository: github.com/scostap/goodreads_bbe_dataset

The data was retrieved in two sets, the first 30000 books and then the remainig 22478. Dates were not parsed and reformated on the second chunk so publishDate and firstPublishDate are representet in a mm/dd/yyyy format for the first 30000 records and Month Day Year for the rest.

Book cover images can be optionally downloaded from the url in the 'coverImg' field. Python code for doing so and an example can be found on the github repo.

The 25 fields of the dataset are:

| Attributes | Definition | Completeness |
| ------------- | ------------- | ------------- | 
| bookId | Book Identifier as in goodreads.com | 100 |
| title | Book title | 100 |
| series | Series Name | 45 |
| author | Book's Author | 100 |
| rating | Global goodreads rating | 100 |
| description | Book's description | 97 |
| language | Book's language | 93 |
| isbn | Book's ISBN | 92 |
| genres | Book's genres | 91 |
| characters | Main characters | 26 |
| bookFormat | Type of binding | 97 |
| edition | Type of edition (ex. Anniversary Edition) | 9 |
| pages | Number of pages | 96 |
| publisher | Editorial | 93 |
| publishDate | publication date | 98 |
| firstPublishDate | Publication date of first edition | 59 |
| awards | List of awards | 20 |
| numRatings | Number of total ratings | 100 |
| ratingsByStars | Number of ratings by stars | 97 |
| likedPercent | Derived field, percent of ratings over 2 starts (as in GoodReads) | 99 |
| setting | Story setting | 22 |
| coverImg | URL to cover image | 99 |
| bbeScore | Score in Best Books Ever list | 100 |
| bbeVotes | Number of votes in Best Books Ever list | 100 |
| price | Book's price (extracted from Iberlibro) | 73 |

Clear search

Close search

Google apps

Main menu

Best Books Ever Dataset

Books Dataset

General Record of Incidence of Mortality (GRIM) books

Data from: NIST/ARPA-E Database of Novel and Emerging Adsorbent Materials

Japanese Novel Data

Data from: 1000PLUS Novels Corpus (1.0)

Dataset of books about Database design

Dataset of books series that contain Logical database design principles

txtLAB Contemporary Novel Data Set

The Novel as Global Form Project: Critical Reception and Circulation Data

Psy-Data-books

Global import data of Books

Data from: ARAMEMNON, a Novel Database for Arabidopsis Integral Membrane...

Fictions littéraires de Gallica / Literary fictions of Gallica

Books Wholesalers in Nevada, United States - 1 Verified Listings Database

PiCoBoo database: Aunt Mavor's Picture Books for Little Readers [Second...

Dataset of books called Advanced database techniques

Books Wholesalers in Louisiana, United States - 5 Verified Listings Database...

institutional-books-1.0

Print book unit sales in the U.S. 2004-2024

Best Books Ever Dataset