100+ datasets found
  1. Best Books Ever Dataset

    • zenodo.org
    csv
    Updated Nov 10, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lorena Casanova Lozano; Sergio Costa Planells; Lorena Casanova Lozano; Sergio Costa Planells (2020). Best Books Ever Dataset [Dataset]. http://doi.org/10.5281/zenodo.4265096
    Explore at:
    csvAvailable download formats
    Dataset updated
    Nov 10, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Lorena Casanova Lozano; Sergio Costa Planells; Lorena Casanova Lozano; Sergio Costa Planells
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    The dataset has been collected in the frame of the Prac1 of the subject Tipology and Data Life Cycle of the Master's Degree in Data Science of the Universitat Oberta de Catalunya (UOC).

    The dataset contains 25 variables and 52478 records corresponding to books on the GoodReads Best Books Ever list (the larges list on the site).

    Original code used to retrieve the dataset can be found on github repository: github.com/scostap/goodreads_bbe_dataset

    The data was retrieved in two sets, the first 30000 books and then the remainig 22478. Dates were not parsed and reformated on the second chunk so publishDate and firstPublishDate are representet in a mm/dd/yyyy format for the first 30000 records and Month Day Year for the rest.

    Book cover images can be optionally downloaded from the url in the 'coverImg' field. Python code for doing so and an example can be found on the github repo.

    The 25 fields of the dataset are:

    | Attributes | Definition | Completeness |
    | ------------- | ------------- | ------------- | 
    | bookId | Book Identifier as in goodreads.com | 100 |
    | title | Book title | 100 |
    | series | Series Name | 45 |
    | author | Book's Author | 100 |
    | rating | Global goodreads rating | 100 |
    | description | Book's description | 97 |
    | language | Book's language | 93 |
    | isbn | Book's ISBN | 92 |
    | genres | Book's genres | 91 |
    | characters | Main characters | 26 |
    | bookFormat | Type of binding | 97 |
    | edition | Type of edition (ex. Anniversary Edition) | 9 |
    | pages | Number of pages | 96 |
    | publisher | Editorial | 93 |
    | publishDate | publication date | 98 |
    | firstPublishDate | Publication date of first edition | 59 |
    | awards | List of awards | 20 |
    | numRatings | Number of total ratings | 100 |
    | ratingsByStars | Number of ratings by stars | 97 |
    | likedPercent | Derived field, percent of ratings over 2 starts (as in GoodReads) | 99 |
    | setting | Story setting | 22 |
    | coverImg | URL to cover image | 99 |
    | bbeScore | Score in Best Books Ever list | 100 |
    | bbeVotes | Number of votes in Best Books Ever list | 100 |
    | price | Book's price (extracted from Iberlibro) | 73 |

  2. Books Dataset

    • figshare.com
    txt
    Updated Jan 19, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Giuseppe Mendola (2016). Books Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.1441255.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jan 19, 2016
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Giuseppe Mendola
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This database contains information about books gathered with help of Google Books API. The database contains 7 different tables where 3 of them are only to relate the other tables together. Tables: Books contains 1062 records. Authors contains 1595 records. Categories 109 records. Metadata 37 records. MD5 (GBooks_2015-06-09.sql) = bfd09094d0e123e668b2e58332b1a98b

  3. General Record of Incidence of Mortality (GRIM) books

    • data.gov.au
    • researchdata.edu.au
    • +2more
    csv
    Updated Apr 14, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Australian Institute of Health and Welfare (2025). General Record of Incidence of Mortality (GRIM) books [Dataset]. https://data.gov.au/data/dataset/grim-books
    Explore at:
    csv(25197618)Available download formats
    Dataset updated
    Apr 14, 2025
    Dataset authored and provided by
    Australian Institute of Health and Welfarehttp://www.aihw.gov.au/
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Description

    Extracted in machine readable form from the AIHW General Record of Incidence of Mortality (GRIM) books.

    GRIM books are Excel workbooks that contain national level, historical and recent deaths data for specific causes of death. The tables present age- and sex-specific counts and rates by cause of death, along with other summary measures.

    GRIM books are available for all causes of death combined and 55 other cause of death groupings. They span different years for different causes of death, depending on the data available. GRIM books for some causes of death start at 1907 and they are the only national electronic tabulations of deaths data by cause registered before 1964. Data from 1964 onwards are sourced from the AIHW National Mortality Database. They include mortality data up to 2023.

    For more information, please see Deaths data at AIHW or contact us at deaths@aihw.gov.au.

    Also available on data.gov.au are the AIHW Mortality Over Regions and Time (MORT) books.

  4. Data from: NIST/ARPA-E Database of Novel and Emerging Adsorbent Materials

    • catalog.data.gov
    • datasets.ai
    • +3more
    Updated Jul 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institute of Standards and Technology (2022). NIST/ARPA-E Database of Novel and Emerging Adsorbent Materials [Dataset]. https://catalog.data.gov/dataset/nist-arpa-e-database-of-novel-and-emerging-adsorbent-materials-ad6ac
    Explore at:
    Dataset updated
    Jul 29, 2022
    Dataset provided by
    National Institute of Standards and Technologyhttp://www.nist.gov/
    Description

    The NIST/ARPA-E Database of Novel and Emerging Adsorbent Materials is a free, web-based catalog of adsorbent materials and measured adsorption properties of numerous materials obtained from article entries from the scientific literature. Search fields for the database include adsorbent material, adsorbate gas, experimental conditions (pressure, temperature), and bibliographic information (author, title, journal), and results from queries are provided as a list of articles matching the search parameters. The database also contains adsorption isotherms digitized from the cataloged articles, which can be compared visually online in the web application or exported for offline analysis.

  5. d

    Japanese Novel Data

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Budwell, Jackson (2023). Japanese Novel Data [Dataset]. http://doi.org/10.7910/DVN/21YHPO
    Explore at:
    Dataset updated
    Nov 8, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Budwell, Jackson
    Description

    A dataset on Japanese novels. This dataset contains 1113 novels. This dataset contains the following variables: Title, words, unique words, unique words used once, UWUU%, kanji, kanji used once, kanji readings, difficulty, average sentence length, characters, publisher, pages, ASIN, and the Japanese title. Note: not all observations have complete publisher, page, ASIN, and Japanese title data. The variables title, words, unique words, unique words used once, UWUU%, kanji, kanji used once, kanji readings, difficulty, average sentence length, and characters were obtained from Jpdb.io. Publisher, pages, ASIN, and the Japanese title were obtained from Amazon.co.jp. The dataset was mined using python and BS4 by CDT Budwell. This dataset was created in support of MA206 (Intro to Statistics) at USMA West Point by CDT Jackson Budwell '25.

  6. E

    Data from: 1000PLUS Novels Corpus (1.0)

    • live.european-language-grid.eu
    binary format
    Updated Jul 18, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2019). 1000PLUS Novels Corpus (1.0) [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/8654
    Explore at:
    binary formatAvailable download formats
    Dataset updated
    Jul 18, 2019
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    Corpus of literary texts intended as benchmark collection for text categorization. It contains 1000 novels written in polish or translated to polish by various authors. This is an extension of 1000 Novels Corpus (http://hdl.handle.net/11321/312). Each text is stored as separate .txt file and .cmdi metadata description.

  7. w

    Dataset of books about Database design

    • workwithdata.com
    Updated Apr 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2025). Dataset of books about Database design [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=j0-book_subject&fop0=%3D&fval0=Database+design&j=1&j0=book_subjects
    Explore at:
    Dataset updated
    Apr 17, 2025
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about books. It has 321 rows and is filtered where the book subjects is Database design. It features 9 columns including author, publication date, language, and book publisher.

  8. w

    Dataset of books series that contain Logical database design principles

    • workwithdata.com
    Updated Nov 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2024). Dataset of books series that contain Logical database design principles [Dataset]. https://www.workwithdata.com/datasets/book-series?f=1&fcol0=j0-book&fop0=%3D&fval0=Logical+database+design+principles&j=1&j0=books
    Explore at:
    Dataset updated
    Nov 25, 2024
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about book series. It has 2 rows and is filtered where the books is Logical database design principles. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.

  9. txtLAB Contemporary Novel Data Set

    • figshare.com
    txt
    Updated Jan 19, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrew Piper (2016). txtLAB Contemporary Novel Data Set [Dataset]. http://doi.org/10.6084/m9.figshare.2061990.v3
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jan 19, 2016
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Andrew Piper
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This table contains a list of ca 1200 novels that represent different genres of contemporary writing. It is discussed in "How Cultural Capital Works" Post-45 (2016): http://post45.research.yale.edu

  10. u

    The Novel as Global Form Project: Critical Reception and Circulation Data

    • recerca.uoc.edu
    • dataverse.csuc.cat
    • +1more
    Updated 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bellido, Aitana; Ikoff, Ventsislav; Sangrà Bruguera, Marta; Puxan-Oliva, Marta; Rotger, Neus; Bellido, Aitana; Ikoff, Ventsislav; Sangrà Bruguera, Marta; Puxan-Oliva, Marta; Rotger, Neus (2024). The Novel as Global Form Project: Critical Reception and Circulation Data [Dataset]. https://recerca.uoc.edu/documentos/67a9c7bb19544708f8c70c3f
    Explore at:
    Dataset updated
    2024
    Authors
    Bellido, Aitana; Ikoff, Ventsislav; Sangrà Bruguera, Marta; Puxan-Oliva, Marta; Rotger, Neus; Bellido, Aitana; Ikoff, Ventsislav; Sangrà Bruguera, Marta; Puxan-Oliva, Marta; Rotger, Neus
    Description

    The present dataset traces the critical reception and international circulation of a selection of contemporary novels published between years 1989 and 2021 that can be productively read through the lens of the “global novel” debate. The archive retrieves more than 1500 data entries concerning the translation, circulation and consecration of our corpus novels. This includes all original and translated editions, literary reviews and interviews with the authors in different countries, specialized academic works, awarded or short-listed literary prizes, film and theater adaptations, as well as metadata from relevant literary agents involved in their international circulation–from translators to literary agents, foreign rights agents, and publishers. These data have served as the basis for the qualitative case studies carried out within the framework of the project “The Novel as Global Form. Poetic Challenges and Cross-border Literary Circulation” (Spanish Research Agency, PID2020-118610GA-I00) of the Universitat Oberta de Catalunya (UOC) in Barcelona, Spain.

    Compiled data respond to the need to answer project questions such as “How globally do ‘global authors’ actually circulate?”, “What paths of circulation and recognition emerge when we consider non-Anglophone authors?”“, and “What is the role of gender in the writing and editorial process?”. In this case, data includes parameters that respond to the project research questions and might be uneven in relation to other matters, especially regarding geographical or linguistic representativeness, which was not the project’s focus.

    The project's selected novels are the following: Norwegian Gert Nygårdshaug’s Mengele Zoo (1989); Georgian Aka Morchiladze’s მოგზაურობა ყარაბაღში (Journey to Karabakh, 1992); Colombian Juan Gabriel Vásquez’s Historia secreta de Costaguana (The Secret History of Costaguana, 2007); Polish Olga Tokarczuk’s Bieguni (Flights, 2008); Brazilian Patricia Melo’s O Ladrão de Cadáveres (The Body Snatcher, 2010); Argentinian Ariana Harwicz’s Mátate, amor (Die, My Love, 2012); South African and Australian J. M. Coetzee’s Jesus trilogy (The Childhood of Jesus, 2013, The Schooldays of Jesus, 2016, and The Death of Jesus, 2019); Nino Haratischwili’s Das achte Leben (für Brilka) (The Eight Life (for Brilka), 2014); Brazilian Carla Madeira‘s Tudo é rio (‘Everything is Rio’, 2014); Argentinian Samanta Schweblin’s Distancia de rescate (Fever Dream, 2014); Italian Bruno Arpaia’s Qualcosa, là fuori (‘Something, Out There’, 2016); French Élisabeth Filhol’s Doggerland (2019); Lebanese Zena El Khalil’s Beirut, I Love You (2019); Catalan Irene Solà’s Canto jo i la muntanya balla (When I Sing, Mountains Dance, 2019), and Turkish-British Elif Shafak’s The Island of Missing Trees (2021).

  11. h

    Psy-Data-books

    • huggingface.co
    Updated Jun 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ammar (2025). Psy-Data-books [Dataset]. https://huggingface.co/datasets/Daemontatox/Psy-Data-books
    Explore at:
    Dataset updated
    Jun 19, 2025
    Authors
    Ammar
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    🧠 Psy-Data-Books: Synthetic Medical & Psychology Conversation Dataset

    Psy-Data-Books is one of the largest synthetic datasets of psychology and medical conversations, generated from verified medical and psychology literature. It is designed for building and training powerful conversational AI systems for healthcare, therapy, and mental health applications.

      📊 Dataset Summary
    

    Domain: Psychology, Psychiatry, Mental Health, General Medicine Data Type: Synthetic… See the full description on the dataset page: https://huggingface.co/datasets/Daemontatox/Psy-Data-books.

  12. Global import data of Books

    • volza.com
    csv
    Updated Jun 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Volza FZ LLC (2025). Global import data of Books [Dataset]. https://www.volza.com/p/books/import/import-in-united-states/
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jun 24, 2025
    Dataset provided by
    Volza
    Authors
    Volza FZ LLC
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Count of importers, Sum of import value, 2014-01-01/2021-09-30, Count of import shipments
    Description

    459554 Global import shipment records of Books with prices, volume & current Buyer's suppliers relationships based on actual Global export trade database.

  13. n

    Data from: ARAMEMNON, a Novel Database for Arabidopsis Integral Membrane...

    • neuinfo.org
    • scicrunch.org
    • +3more
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). ARAMEMNON, a Novel Database for Arabidopsis Integral Membrane Proteins [Dataset]. http://identifiers.org/RRID:SCR_007552/resolver?q=&i=rrid
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    A database of putative membrane proteins of Thale Cress (Arabidopsis thaliana), Rice (Oryza sativa) and about some 6700 putative membrane proteins of ~300 other seed plants. The database stores data about: * protein, cDNA and genomic sequences * exon predictions (A.thaliana and O.sativa) * different cDNA/protein models of genes (A.thaliana and O.sativa) * ontology terms according to the Gene Ontology (GO) Consortium * protein sequence motifs as predictable by using the PFAM database * transporter classification as predictable by using the TC-system * bibliographic references * predictions for transmembrane spanning proteins (transmembrane alpha helices, beta barrels) * predictions for membrane-anchored proteins (GPI-attachment, prenylation, myristoylation) * prediction of the subcellular location * consensus predictions (transmembrane alpha helices, subcellular location) * isospecic homologs (''paralogs'') * heterospecic homologs (''orthologs'')

  14. Z

    Fictions littéraires de Gallica / Literary fictions of Gallica

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jul 19, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Langlais, Pierre-Carl (2024). Fictions littéraires de Gallica / Literary fictions of Gallica [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4660197
    Explore at:
    Dataset updated
    Jul 19, 2024
    Dataset authored and provided by
    Langlais, Pierre-Carl
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The collection "Fiction littéraire de Gallica" includes 19,240 public domain documents from the digital platform of the French National Library that were originally classified as novels or, more broadly, as literary fiction in prose. It consists of 372 tables of data in tsv format for each year of publication from 1600 to 1996 (all the missing years are in the 17th and 20th centuries). Each table is structured at the page-level of each novel (5,723,986 pages in all). It contains the complete text with the addition of some metadata. It can be opened in Excel or, preferably, with the new data analysis environments in R or Python (tidyverse, pandas…)

    This corpus can be used for large-scale quantitative analyses in computational humanities. The OCR text is presented in a raw format without any correction or enrichment in order to be directly processed for text mining purposes.

    The extraction is based on a historical categorization of the novels: the Y2 or Ybis classification. This classification, invented in 1730, is the only one that has been continuously applied to the BNF collections now available in the public domain (mainly before 1950). Consequently, the dataset is based on a definition of "novel" that is generally contemporary of the publication.

    A French data paper (in PDF and HTML) presents the construction process of the Y2 category and describes the structuring of the corpus. It also gives several examples of possible uses for computational humanities projects.

  15. p

    Books Wholesalers in Nevada, United States - 1 Verified Listings Database

    • poidata.io
    csv, excel, json
    Updated Jul 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Poidata.io (2025). Books Wholesalers in Nevada, United States - 1 Verified Listings Database [Dataset]. https://www.poidata.io/report/books-wholesaler/united-states/nevada
    Explore at:
    json, csv, excelAvailable download formats
    Dataset updated
    Jul 1, 2025
    Dataset provided by
    Poidata.io
    Area covered
    Nevada, United States
    Description

    Comprehensive dataset of 1 Books wholesalers in Nevada, United States as of July, 2025. Includes verified contact information (email, phone), geocoded addresses, customer ratings, reviews, business categories, and operational details. Perfect for market research, lead generation, competitive analysis, and business intelligence. Download a complimentary sample to evaluate data quality and completeness.

  16. n

    PiCoBoo database: Aunt Mavor's Picture Books for Little Readers [Second...

    • data.ncl.ac.uk
    pdf
    Updated May 31, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Francesca Tancini (2023). PiCoBoo database: Aunt Mavor's Picture Books for Little Readers [Second Series] [Dataset]. http://doi.org/10.25405/data.ncl.15181071.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Newcastle University
    Authors
    Francesca Tancini
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Aunt Mavor's Picture Books for Little Readers [Second Series]

  17. w

    Dataset of books called Advanced database techniques

    • workwithdata.com
    Updated Apr 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2025). Dataset of books called Advanced database techniques [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=Advanced+database+techniques
    Explore at:
    Dataset updated
    Apr 17, 2025
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about books. It has 1 row and is filtered where the book is Advanced database techniques. It features 7 columns including author, publication date, language, and book publisher.

  18. p

    Books Wholesalers in Louisiana, United States - 5 Verified Listings Database...

    • poidata.io
    csv, excel, json
    Updated Jun 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Poidata.io (2025). Books Wholesalers in Louisiana, United States - 5 Verified Listings Database [Dataset]. https://www.poidata.io/report/books-wholesaler/united-states/louisiana
    Explore at:
    json, excel, csvAvailable download formats
    Dataset updated
    Jun 30, 2025
    Dataset provided by
    Poidata.io
    Area covered
    Louisiana, United States
    Description

    Comprehensive dataset of 5 Books wholesalers in Louisiana, United States as of June, 2025. Includes verified contact information (email, phone), geocoded addresses, customer ratings, reviews, business categories, and operational details. Perfect for market research, lead generation, competitive analysis, and business intelligence. Download a complimentary sample to evaluate data quality and completeness.

  19. h

    institutional-books-1.0

    • huggingface.co
    Updated Jun 11, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Institutional Data Initiative (2025). institutional-books-1.0 [Dataset]. https://huggingface.co/datasets/institutional/institutional-books-1.0
    Explore at:
    Dataset updated
    Jun 11, 2025
    Dataset authored and provided by
    Institutional Data Initiative
    Description

    📚 Institutional Books 1.0

    Institutional Books is a growing corpus of public domain books. This 1.0 release is comprised of 983,004 public domain books digitized as part of Harvard Library's participation in the Google Books project and refined by the Institutional Data Initiative. Use of this data is governed by the IDI Terms of Use for Early-Access.

    983K books, published largely in the 19th and 20th centuries 242B o200k_base tokens 386M pages of text, available in both original… See the full description on the dataset page: https://huggingface.co/datasets/institutional/institutional-books-1.0.

  20. Print book unit sales in the U.S. 2004-2024

    • statista.com
    Updated Jun 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Print book unit sales in the U.S. 2004-2024 [Dataset]. https://www.statista.com/statistics/422595/print-book-sales-usa/
    Explore at:
    Dataset updated
    Jun 24, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    United States
    Description

    Data showing how many books were sold in 2024 revealed that the printed book market remains healthy: a total of ***** million units were sold that year among outlets which reported to the source. Whilst this marked a small jump from the previous year, the figure peaked in 2021 and has not surpassed *** million since. Trade paperbacks remained the dominant format. Book sales statistics Looking at book sales by year, 2005 to 2010 were the most lucrative for the printed book market, with well over *** million units sold annually during that five-year period. After dropping below *** million in 2012, gradual and consistent increases can be seen each year, with the exception of between the years 2018 and 2019. For bookstores though, how many books are sold each year depends on the success of key months across a twelve-month period. Bookstore sales in the United States are at their highest in December, January, and August, but figures for December are consistently higher than other months. Books are popular holiday gifts, with around ** to ** percent of consumers responding to annual surveys in each year from 2012 to 2020 saying that they planned to purchase books as presents during the festive season.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Lorena Casanova Lozano; Sergio Costa Planells; Lorena Casanova Lozano; Sergio Costa Planells (2020). Best Books Ever Dataset [Dataset]. http://doi.org/10.5281/zenodo.4265096
Organization logo

Best Books Ever Dataset

Explore at:
3 scholarly articles cite this dataset (View in Google Scholar)
csvAvailable download formats
Dataset updated
Nov 10, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Lorena Casanova Lozano; Sergio Costa Planells; Lorena Casanova Lozano; Sergio Costa Planells
License

Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically

Description

The dataset has been collected in the frame of the Prac1 of the subject Tipology and Data Life Cycle of the Master's Degree in Data Science of the Universitat Oberta de Catalunya (UOC).

The dataset contains 25 variables and 52478 records corresponding to books on the GoodReads Best Books Ever list (the larges list on the site).

Original code used to retrieve the dataset can be found on github repository: github.com/scostap/goodreads_bbe_dataset

The data was retrieved in two sets, the first 30000 books and then the remainig 22478. Dates were not parsed and reformated on the second chunk so publishDate and firstPublishDate are representet in a mm/dd/yyyy format for the first 30000 records and Month Day Year for the rest.

Book cover images can be optionally downloaded from the url in the 'coverImg' field. Python code for doing so and an example can be found on the github repo.

The 25 fields of the dataset are:

| Attributes | Definition | Completeness |
| ------------- | ------------- | ------------- | 
| bookId | Book Identifier as in goodreads.com | 100 |
| title | Book title | 100 |
| series | Series Name | 45 |
| author | Book's Author | 100 |
| rating | Global goodreads rating | 100 |
| description | Book's description | 97 |
| language | Book's language | 93 |
| isbn | Book's ISBN | 92 |
| genres | Book's genres | 91 |
| characters | Main characters | 26 |
| bookFormat | Type of binding | 97 |
| edition | Type of edition (ex. Anniversary Edition) | 9 |
| pages | Number of pages | 96 |
| publisher | Editorial | 93 |
| publishDate | publication date | 98 |
| firstPublishDate | Publication date of first edition | 59 |
| awards | List of awards | 20 |
| numRatings | Number of total ratings | 100 |
| ratingsByStars | Number of ratings by stars | 97 |
| likedPercent | Derived field, percent of ratings over 2 starts (as in GoodReads) | 99 |
| setting | Story setting | 22 |
| coverImg | URL to cover image | 99 |
| bbeScore | Score in Best Books Ever list | 100 |
| bbeVotes | Number of votes in Best Books Ever list | 100 |
| price | Book's price (extracted from Iberlibro) | 73 |

Search
Clear search
Close search
Google apps
Main menu