Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The dataset has been collected in the frame of the Prac1 of the subject Tipology and Data Life Cycle of the Master's Degree in Data Science of the Universitat Oberta de Catalunya (UOC).
The dataset contains 25 variables and 52478 records corresponding to books on the GoodReads Best Books Ever list (the larges list on the site).
Original code used to retrieve the dataset can be found on github repository: github.com/scostap/goodreads_bbe_dataset
The data was retrieved in two sets, the first 30000 books and then the remainig 22478. Dates were not parsed and reformated on the second chunk so publishDate and firstPublishDate are representet in a mm/dd/yyyy format for the first 30000 records and Month Day Year for the rest.
Book cover images can be optionally downloaded from the url in the 'coverImg' field. Python code for doing so and an example can be found on the github repo.
The 25 fields of the dataset are:
| Attributes | Definition | Completeness |
| ------------- | ------------- | ------------- |
| bookId | Book Identifier as in goodreads.com | 100 |
| title | Book title | 100 |
| series | Series Name | 45 |
| author | Book's Author | 100 |
| rating | Global goodreads rating | 100 |
| description | Book's description | 97 |
| language | Book's language | 93 |
| isbn | Book's ISBN | 92 |
| genres | Book's genres | 91 |
| characters | Main characters | 26 |
| bookFormat | Type of binding | 97 |
| edition | Type of edition (ex. Anniversary Edition) | 9 |
| pages | Number of pages | 96 |
| publisher | Editorial | 93 |
| publishDate | publication date | 98 |
| firstPublishDate | Publication date of first edition | 59 |
| awards | List of awards | 20 |
| numRatings | Number of total ratings | 100 |
| ratingsByStars | Number of ratings by stars | 97 |
| likedPercent | Derived field, percent of ratings over 2 starts (as in GoodReads) | 99 |
| setting | Story setting | 22 |
| coverImg | URL to cover image | 99 |
| bbeScore | Score in Best Books Ever list | 100 |
| bbeVotes | Number of votes in Best Books Ever list | 100 |
| price | Book's price (extracted from Iberlibro) | 73 |
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This database contains information about books gathered with help of Google Books API. The database contains 7 different tables where 3 of them are only to relate the other tables together. Tables: Books contains 1062 records. Authors contains 1595 records. Categories 109 records. Metadata 37 records. MD5 (GBooks_2015-06-09.sql) = bfd09094d0e123e668b2e58332b1a98b
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Extracted in machine readable form from the AIHW General Record of Incidence of Mortality (GRIM) books.
GRIM books are Excel workbooks that contain national level, historical and recent deaths data for specific causes of death. The tables present age- and sex-specific counts and rates by cause of death, along with other summary measures.
GRIM books are available for all causes of death combined and 55 other cause of death groupings. They span different years for different causes of death, depending on the data available. GRIM books for some causes of death start at 1907 and they are the only national electronic tabulations of deaths data by cause registered before 1964. Data from 1964 onwards are sourced from the AIHW National Mortality Database. They include mortality data up to 2023.
For more information, please see Deaths data at AIHW or contact us at deaths@aihw.gov.au.
Also available on data.gov.au are the AIHW Mortality Over Regions and Time (MORT) books.
The NIST/ARPA-E Database of Novel and Emerging Adsorbent Materials is a free, web-based catalog of adsorbent materials and measured adsorption properties of numerous materials obtained from article entries from the scientific literature. Search fields for the database include adsorbent material, adsorbate gas, experimental conditions (pressure, temperature), and bibliographic information (author, title, journal), and results from queries are provided as a list of articles matching the search parameters. The database also contains adsorption isotherms digitized from the cataloged articles, which can be compared visually online in the web application or exported for offline analysis.
A dataset on Japanese novels. This dataset contains 1113 novels. This dataset contains the following variables: Title, words, unique words, unique words used once, UWUU%, kanji, kanji used once, kanji readings, difficulty, average sentence length, characters, publisher, pages, ASIN, and the Japanese title. Note: not all observations have complete publisher, page, ASIN, and Japanese title data. The variables title, words, unique words, unique words used once, UWUU%, kanji, kanji used once, kanji readings, difficulty, average sentence length, and characters were obtained from Jpdb.io. Publisher, pages, ASIN, and the Japanese title were obtained from Amazon.co.jp. The dataset was mined using python and BS4 by CDT Budwell. This dataset was created in support of MA206 (Intro to Statistics) at USMA West Point by CDT Jackson Budwell '25.
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Corpus of literary texts intended as benchmark collection for text categorization. It contains 1000 novels written in polish or translated to polish by various authors. This is an extension of 1000 Novels Corpus (http://hdl.handle.net/11321/312). Each text is stored as separate .txt file and .cmdi metadata description.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about books. It has 321 rows and is filtered where the book subjects is Database design. It features 9 columns including author, publication date, language, and book publisher.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about book series. It has 2 rows and is filtered where the books is Logical database design principles. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This table contains a list of ca 1200 novels that represent different genres of contemporary writing. It is discussed in "How Cultural Capital Works" Post-45 (2016): http://post45.research.yale.edu
The present dataset traces the critical reception and international circulation of a selection of contemporary novels published between years 1989 and 2021 that can be productively read through the lens of the “global novel” debate. The archive retrieves more than 1500 data entries concerning the translation, circulation and consecration of our corpus novels. This includes all original and translated editions, literary reviews and interviews with the authors in different countries, specialized academic works, awarded or short-listed literary prizes, film and theater adaptations, as well as metadata from relevant literary agents involved in their international circulation–from translators to literary agents, foreign rights agents, and publishers. These data have served as the basis for the qualitative case studies carried out within the framework of the project “The Novel as Global Form. Poetic Challenges and Cross-border Literary Circulation” (Spanish Research Agency, PID2020-118610GA-I00) of the Universitat Oberta de Catalunya (UOC) in Barcelona, Spain.
Compiled data respond to the need to answer project questions such as “How globally do ‘global authors’ actually circulate?”, “What paths of circulation and recognition emerge when we consider non-Anglophone authors?”“, and “What is the role of gender in the writing and editorial process?”. In this case, data includes parameters that respond to the project research questions and might be uneven in relation to other matters, especially regarding geographical or linguistic representativeness, which was not the project’s focus.
The project's selected novels are the following: Norwegian Gert Nygårdshaug’s Mengele Zoo (1989); Georgian Aka Morchiladze’s მოგზაურობა ყარაბაღში (Journey to Karabakh, 1992); Colombian Juan Gabriel Vásquez’s Historia secreta de Costaguana (The Secret History of Costaguana, 2007); Polish Olga Tokarczuk’s Bieguni (Flights, 2008); Brazilian Patricia Melo’s O Ladrão de Cadáveres (The Body Snatcher, 2010); Argentinian Ariana Harwicz’s Mátate, amor (Die, My Love, 2012); South African and Australian J. M. Coetzee’s Jesus trilogy (The Childhood of Jesus, 2013, The Schooldays of Jesus, 2016, and The Death of Jesus, 2019); Nino Haratischwili’s Das achte Leben (für Brilka) (The Eight Life (for Brilka), 2014); Brazilian Carla Madeira‘s Tudo é rio (‘Everything is Rio’, 2014); Argentinian Samanta Schweblin’s Distancia de rescate (Fever Dream, 2014); Italian Bruno Arpaia’s Qualcosa, là fuori (‘Something, Out There’, 2016); French Élisabeth Filhol’s Doggerland (2019); Lebanese Zena El Khalil’s Beirut, I Love You (2019); Catalan Irene Solà’s Canto jo i la muntanya balla (When I Sing, Mountains Dance, 2019), and Turkish-British Elif Shafak’s The Island of Missing Trees (2021).
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
🧠 Psy-Data-Books: Synthetic Medical & Psychology Conversation Dataset
Psy-Data-Books is one of the largest synthetic datasets of psychology and medical conversations, generated from verified medical and psychology literature. It is designed for building and training powerful conversational AI systems for healthcare, therapy, and mental health applications.
📊 Dataset Summary
Domain: Psychology, Psychiatry, Mental Health, General Medicine Data Type: Synthetic… See the full description on the dataset page: https://huggingface.co/datasets/Daemontatox/Psy-Data-books.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
459554 Global import shipment records of Books with prices, volume & current Buyer's suppliers relationships based on actual Global export trade database.
A database of putative membrane proteins of Thale Cress (Arabidopsis thaliana), Rice (Oryza sativa) and about some 6700 putative membrane proteins of ~300 other seed plants. The database stores data about: * protein, cDNA and genomic sequences * exon predictions (A.thaliana and O.sativa) * different cDNA/protein models of genes (A.thaliana and O.sativa) * ontology terms according to the Gene Ontology (GO) Consortium * protein sequence motifs as predictable by using the PFAM database * transporter classification as predictable by using the TC-system * bibliographic references * predictions for transmembrane spanning proteins (transmembrane alpha helices, beta barrels) * predictions for membrane-anchored proteins (GPI-attachment, prenylation, myristoylation) * prediction of the subcellular location * consensus predictions (transmembrane alpha helices, subcellular location) * isospecic homologs (''paralogs'') * heterospecic homologs (''orthologs'')
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The collection "Fiction littéraire de Gallica" includes 19,240 public domain documents from the digital platform of the French National Library that were originally classified as novels or, more broadly, as literary fiction in prose. It consists of 372 tables of data in tsv format for each year of publication from 1600 to 1996 (all the missing years are in the 17th and 20th centuries). Each table is structured at the page-level of each novel (5,723,986 pages in all). It contains the complete text with the addition of some metadata. It can be opened in Excel or, preferably, with the new data analysis environments in R or Python (tidyverse, pandas…)
This corpus can be used for large-scale quantitative analyses in computational humanities. The OCR text is presented in a raw format without any correction or enrichment in order to be directly processed for text mining purposes.
The extraction is based on a historical categorization of the novels: the Y2 or Ybis classification. This classification, invented in 1730, is the only one that has been continuously applied to the BNF collections now available in the public domain (mainly before 1950). Consequently, the dataset is based on a definition of "novel" that is generally contemporary of the publication.
A French data paper (in PDF and HTML) presents the construction process of the Y2 category and describes the structuring of the corpus. It also gives several examples of possible uses for computational humanities projects.
Comprehensive dataset of 1 Books wholesalers in Nevada, United States as of July, 2025. Includes verified contact information (email, phone), geocoded addresses, customer ratings, reviews, business categories, and operational details. Perfect for market research, lead generation, competitive analysis, and business intelligence. Download a complimentary sample to evaluate data quality and completeness.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Aunt Mavor's Picture Books for Little Readers [Second Series]
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about books. It has 1 row and is filtered where the book is Advanced database techniques. It features 7 columns including author, publication date, language, and book publisher.
Comprehensive dataset of 5 Books wholesalers in Louisiana, United States as of June, 2025. Includes verified contact information (email, phone), geocoded addresses, customer ratings, reviews, business categories, and operational details. Perfect for market research, lead generation, competitive analysis, and business intelligence. Download a complimentary sample to evaluate data quality and completeness.
📚 Institutional Books 1.0
Institutional Books is a growing corpus of public domain books. This 1.0 release is comprised of 983,004 public domain books digitized as part of Harvard Library's participation in the Google Books project and refined by the Institutional Data Initiative. Use of this data is governed by the IDI Terms of Use for Early-Access.
983K books, published largely in the 19th and 20th centuries 242B o200k_base tokens 386M pages of text, available in both original… See the full description on the dataset page: https://huggingface.co/datasets/institutional/institutional-books-1.0.
Data showing how many books were sold in 2024 revealed that the printed book market remains healthy: a total of ***** million units were sold that year among outlets which reported to the source. Whilst this marked a small jump from the previous year, the figure peaked in 2021 and has not surpassed *** million since. Trade paperbacks remained the dominant format. Book sales statistics Looking at book sales by year, 2005 to 2010 were the most lucrative for the printed book market, with well over *** million units sold annually during that five-year period. After dropping below *** million in 2012, gradual and consistent increases can be seen each year, with the exception of between the years 2018 and 2019. For bookstores though, how many books are sold each year depends on the success of key months across a twelve-month period. Bookstore sales in the United States are at their highest in December, January, and August, but figures for December are consistently higher than other months. Books are popular holiday gifts, with around ** to ** percent of consumers responding to annual surveys in each year from 2012 to 2020 saying that they planned to purchase books as presents during the festive season.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The dataset has been collected in the frame of the Prac1 of the subject Tipology and Data Life Cycle of the Master's Degree in Data Science of the Universitat Oberta de Catalunya (UOC).
The dataset contains 25 variables and 52478 records corresponding to books on the GoodReads Best Books Ever list (the larges list on the site).
Original code used to retrieve the dataset can be found on github repository: github.com/scostap/goodreads_bbe_dataset
The data was retrieved in two sets, the first 30000 books and then the remainig 22478. Dates were not parsed and reformated on the second chunk so publishDate and firstPublishDate are representet in a mm/dd/yyyy format for the first 30000 records and Month Day Year for the rest.
Book cover images can be optionally downloaded from the url in the 'coverImg' field. Python code for doing so and an example can be found on the github repo.
The 25 fields of the dataset are:
| Attributes | Definition | Completeness |
| ------------- | ------------- | ------------- |
| bookId | Book Identifier as in goodreads.com | 100 |
| title | Book title | 100 |
| series | Series Name | 45 |
| author | Book's Author | 100 |
| rating | Global goodreads rating | 100 |
| description | Book's description | 97 |
| language | Book's language | 93 |
| isbn | Book's ISBN | 92 |
| genres | Book's genres | 91 |
| characters | Main characters | 26 |
| bookFormat | Type of binding | 97 |
| edition | Type of edition (ex. Anniversary Edition) | 9 |
| pages | Number of pages | 96 |
| publisher | Editorial | 93 |
| publishDate | publication date | 98 |
| firstPublishDate | Publication date of first edition | 59 |
| awards | List of awards | 20 |
| numRatings | Number of total ratings | 100 |
| ratingsByStars | Number of ratings by stars | 97 |
| likedPercent | Derived field, percent of ratings over 2 starts (as in GoodReads) | 99 |
| setting | Story setting | 22 |
| coverImg | URL to cover image | 99 |
| bbeScore | Score in Best Books Ever list | 100 |
| bbeVotes | Number of votes in Best Books Ever list | 100 |
| price | Book's price (extracted from Iberlibro) | 73 |