Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The dataset has been collected in the frame of the Prac1 of the subject Tipology and Data Life Cycle of the Master's Degree in Data Science of the Universitat Oberta de Catalunya (UOC).
The dataset contains 25 variables and 52478 records corresponding to books on the GoodReads Best Books Ever list (the larges list on the site).
Original code used to retrieve the dataset can be found on github repository: github.com/scostap/goodreads_bbe_dataset
The data was retrieved in two sets, the first 30000 books and then the remainig 22478. Dates were not parsed and reformated on the second chunk so publishDate and firstPublishDate are representet in a mm/dd/yyyy format for the first 30000 records and Month Day Year for the rest.
Book cover images can be optionally downloaded from the url in the 'coverImg' field. Python code for doing so and an example can be found on the github repo.
The 25 fields of the dataset are:
| Attributes | Definition | Completeness |
| ------------- | ------------- | ------------- |
| bookId | Book Identifier as in goodreads.com | 100 |
| title | Book title | 100 |
| series | Series Name | 45 |
| author | Book's Author | 100 |
| rating | Global goodreads rating | 100 |
| description | Book's description | 97 |
| language | Book's language | 93 |
| isbn | Book's ISBN | 92 |
| genres | Book's genres | 91 |
| characters | Main characters | 26 |
| bookFormat | Type of binding | 97 |
| edition | Type of edition (ex. Anniversary Edition) | 9 |
| pages | Number of pages | 96 |
| publisher | Editorial | 93 |
| publishDate | publication date | 98 |
| firstPublishDate | Publication date of first edition | 59 |
| awards | List of awards | 20 |
| numRatings | Number of total ratings | 100 |
| ratingsByStars | Number of ratings by stars | 97 |
| likedPercent | Derived field, percent of ratings over 2 starts (as in GoodReads) | 99 |
| setting | Story setting | 22 |
| coverImg | URL to cover image | 99 |
| bbeScore | Score in Best Books Ever list | 100 |
| bbeVotes | Number of votes in Best Books Ever list | 100 |
| price | Book's price (extracted from Iberlibro) | 73 |
📚 Institutional Books 1.0
Institutional Books is a growing corpus of public domain books. This 1.0 release is comprised of 983,004 public domain books digitized as part of Harvard Library's participation in the Google Books project and refined by the Institutional Data Initiative. Use of this data is governed by the IDI Terms of Use for Early-Access.
983K books, published largely in the 19th and 20th centuries 242B o200k_base tokens 386M pages of text, available in both original… See the full description on the dataset page: https://huggingface.co/datasets/institutional/institutional-books-1.0.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This database contains information about books gathered with help of Google Books API. The database contains 7 different tables where 3 of them are only to relate the other tables together. Tables: Books contains 1062 records. Authors contains 1595 records. Categories 109 records. Metadata 37 records. MD5 (GBooks_2015-06-09.sql) = bfd09094d0e123e668b2e58332b1a98b
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The dataset contains metadata for over 200,000 Christian books, papers covering a wide range of genres and topics. The data was collected from a variety of sources, including online retailers, libraries, and publishers.
This dataset contains a series of metadata for Christian books, including the following fields:
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Dataset Card for "Hindawi Books Dataset"
Hindawi Books Dataset is a large collection of more than 3000 books written in Modern Standard Arabic.
Dataset Description
Hindawi Books Dataset offers a rich and diverse collection of literary works, covering various topics and genres, all written in Modern Standard Arabic. The dataset includes information about each book, such as the title, author name, book abstract, and a link to access the complete text online. Additionally… See the full description on the dataset page: https://huggingface.co/datasets/alielfilali01/Hindawi-Books-dataset.
https://choosealicense.com/licenses/gpl-3.0/https://choosealicense.com/licenses/gpl-3.0/
Arabic Books
Dataset Summary
The arabic-books dataset contains 8,500 rows of text, each representing the full text of a single Arabic book. These texts were extracted using the arabic-large-nougat model, showcasing the model’s capabilities in Arabic OCR and text extraction. The dataset spans a total of 1.1 billion tokens, calculated using the GPT-4 tokenizer. This dataset is a testimony to the quality of the Arabic Nougat models and their effectiveness in extracting… See the full description on the dataset page: https://huggingface.co/datasets/MohamedRashad/arabic-books.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
## Overview
Oriented Books is a dataset for object detection tasks - it contains Book annotations for 661 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [MIT license](https://creativecommons.org/licenses/MIT).
Reading books remains a popular pastime for U.S. adults, with ** percent of respondents to a 2021 survey saying that they had read a book in any format within the last year. Despite online media formats now being the preferred option for many consumers when it comes to television, music, and gaming, print books are by far the most popular format among readers in the United States. Whilst almost double the share of adults now read audiobooks compared to 2011, only ** percent claimed to have read an audiobook in the last year compared to ** percent who said that they had read a print book. Book sales in the United States In 2020, bookstore sales in the United States amounted to **** billion U.S. dollars. Sales in 2019 and 2020 were the lowest recorded since the early *****, and the combined effect of the coronavirus outbreak, along with the growing appeal of online purchasing, will likely mean that bookstore sales will continue to drop. Bookstores tend to see most success in August, December, and January, and sales revenue often surpasses *********** U.S. dollars in those months each year. That said, monthly retail sales of bookstores in the U.S. are notably lower overall than in previous years and were particularly poor in spring 2020 as a result of national shutdowns to stem the spread of COVID-19. Influence of COVID-19 on reading habits The coronavirus pandemic led to increased media consumption in general, but not only among avid video and music streaming fans. Data from a survey in March 2020 revealed that ** percent of Millennials read more books due to the COVID-19 outbreak, making consumers in this group the most likely to have done so compared to ** percent of the total survey sample. Meanwhile, ** percent of Boomers said that their reading habits had not changed.
This data was gathered as part of the data mining project for General Assembly Data Science Immersive course.
This data was acquired from Google Books store. Google API was used to acquire the data. Nine features were gathered for each book in the data set. the column names mostly are self explanatory nevertheless, it will be explained below.
I like to thank google for making a free available API for their services and websites. I also would love to acknowledge the effort of the web scraper extension developer, it is really nice and powerful tool for web scraping.
©2019 Google
Here is a story. you love reading books, and recently, you bought a book that you thought you liked. However, after reading half the book you still don't feel the enthusiasm and joy you expected. I think that machine learning algorithms might help solve such a problems.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about books. It has 2,617,384 rows. It features 4 columns: author, book publisher, and ISBN. It is 97% filled with non-null values.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about books. It has 6 rows and is filtered where the book is Genre. It features 7 columns including author, publication date, language, and book publisher.
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
N-grams are fixed size tuples of items. In this case the items are words extracted from the Google Books corpus. The n specifies the number of elements in the tuple, so a 5-gram contains five words or characters. The n-grams in this dataset were produced by passing a sliding window of the text of books and outputting a record for each new token.
According to a survey held in the United States between March and April 2020, 70 percent of respondents said that they read print books the most, with 39 percent of those consumers preferring their books to be new.
The study was conducted as the U.S. went into lockdown to prevent the spread of the coronavirus, however although the virus certainly affected media consumption in the United States, what did not change was consumers' book preferences. Print has always been the most popular book format in the U.S. and figures on increased media consumption during the pandemic showed that even Gen Z, a generation famed for loving digital, were the most likely to be reading books more than usual during the outbreak.
Book consumption in the U.S.
Whilst printed newspapers and magazines have struggled to survive as digital formats grow ever more prevalent and appealing, when it comes to books U.S. consumers still have a clear preference for print. Annual survey data consistently shows that U.S. adults are far more likely to have read a print book in the last year than a digital version thereof, and whilst the popularity of digital books has increased, print remains the favorite.
As far as book buying goes, whilst the number of print books sold in the U.S. fluctuates each year, the figures remain relatively stable. Although unit sales have not surpassed 700 million since 2010, the number came close in 2018 and yearly sales from 2015 to 2019 were higher than the amount recorded in 2004.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
These data were gathered during the Books, Minds, and Bodies research project in 2015-16. The project was designed to investigate the therapeutic potential of shared reading, and involved running 2 reading groups over 2 consecutive terms and recording participants' discussions of the texts being read aloud together. These recordings were subsequently transcribed and used for analysis of emotional variance and linguistic similarity.
Consistent with the ethical approval granted for the study, word order in the transcripts has been randomized so as to preclude any personal data being disclosed. This was done by tokenizing the text of each transcript into grammatical and lexical units (i.e. punctuation signs and words). These were shuffled using the "Random" module in the Python programming language, which provides a range of mathematical operations for collections of discrete objects. Nevertheless, grouping variables were preserved at the level of group (MT and HT terms) and session ID. As the calculation of values for emotional variance (on the dimensions of valence, arousal, and dominance) does not require syntax to be preserved, randomizing the data in this way should not affect the future calculation of word norm values.
The dataset also includes text/discussion similarity calculations, qualitative coding results, and participants' post-participation feedback data.
NB: this dataset replaces 'Books, Minds, and Bodies: raw transcript text plus VAD values' at https://ora.ox.ac.uk/objects/uuid:c370b75b-d37e-41be-89bb-cbb67a0c8614
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Don't worry about it, This is professionals work right here!
Goodreads Book Descriptions
A dataset of English book titles and descriptions from Goodreads. The original dataset has 2.3 million books total with many more fields. There may exist a small number of non-English books in this dataset.
Citations
Mengting Wan, Julian McAuley, "Item Recommendation on Monotonic Behavior Chains", in RecSys'18. Mengting Wan, Rishabh Misra, Ndapa Nakashole, Julian McAuley, "Fine-Grained Spoiler Detection from Large-Scale Review Corpora", in… See the full description on the dataset page: https://huggingface.co/datasets/booksouls/goodreads-book-descriptions.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about books. It has 534 rows and is filtered where the book publisher is Houghton Mifflin. It features 7 columns including author, publication date, language, and book publisher.
As of December 2021, just ** percent of surveyed women in the United States said that they had not read any books in the last year, ten percent less than the share of men who said the same. Both male and female respondents were most likely to have read *** to **** books in the year leading to the survey, though **** percent of women reported having read more than ** books in that time.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The Ministry of Educations' - Basic Education Statistical Booklet captures national statistics for the Education Sector in totality. This dataset highlights the number of primary school textbooks per subject in each and every county Source: Table 51- Primary Schools Text Books
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about books. It has 377 rows and is filtered where the book publisher is Bantam Books. It features 7 columns including author, publication date, language, and book publisher.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The dataset has been collected in the frame of the Prac1 of the subject Tipology and Data Life Cycle of the Master's Degree in Data Science of the Universitat Oberta de Catalunya (UOC).
The dataset contains 25 variables and 52478 records corresponding to books on the GoodReads Best Books Ever list (the larges list on the site).
Original code used to retrieve the dataset can be found on github repository: github.com/scostap/goodreads_bbe_dataset
The data was retrieved in two sets, the first 30000 books and then the remainig 22478. Dates were not parsed and reformated on the second chunk so publishDate and firstPublishDate are representet in a mm/dd/yyyy format for the first 30000 records and Month Day Year for the rest.
Book cover images can be optionally downloaded from the url in the 'coverImg' field. Python code for doing so and an example can be found on the github repo.
The 25 fields of the dataset are:
| Attributes | Definition | Completeness |
| ------------- | ------------- | ------------- |
| bookId | Book Identifier as in goodreads.com | 100 |
| title | Book title | 100 |
| series | Series Name | 45 |
| author | Book's Author | 100 |
| rating | Global goodreads rating | 100 |
| description | Book's description | 97 |
| language | Book's language | 93 |
| isbn | Book's ISBN | 92 |
| genres | Book's genres | 91 |
| characters | Main characters | 26 |
| bookFormat | Type of binding | 97 |
| edition | Type of edition (ex. Anniversary Edition) | 9 |
| pages | Number of pages | 96 |
| publisher | Editorial | 93 |
| publishDate | publication date | 98 |
| firstPublishDate | Publication date of first edition | 59 |
| awards | List of awards | 20 |
| numRatings | Number of total ratings | 100 |
| ratingsByStars | Number of ratings by stars | 97 |
| likedPercent | Derived field, percent of ratings over 2 starts (as in GoodReads) | 99 |
| setting | Story setting | 22 |
| coverImg | URL to cover image | 99 |
| bbeScore | Score in Best Books Ever list | 100 |
| bbeVotes | Number of votes in Best Books Ever list | 100 |
| price | Book's price (extracted from Iberlibro) | 73 |