Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Vector store of embeddings for books
"1984" by George Orwell "The Almanac of Naval Ravikant" by Eric Jorgenson
This is a faiss vector store created with instructor embeddings using LangChain . Use it for similarity search, question answering or anything else that leverages embeddings! 😃 Creating these embeddings can take a while so here's a convenient, downloadable one 🤗
How to use
Specify the book from one of the following: "1984" "The Almanac of Naval… See the full description on the dataset page: https://huggingface.co/datasets/calmgoose/book-embeddings.
Book-Crossing dataset mined by Cai-Nicolas Ziegler
Freely available for research use when acknowledged with the following reference (further details on the dataset are given in this publication):
Cai-Nicolas Ziegler, Sean M. McNee, Joseph A. Konstan, Georg Lausen; Proceedings of the 14th International World Wide Web Conference (WWW '05), May 10-14, 2005, Chiba, Japan. To appear.
Further information and the original dataset can be found at the original webpage.
Changes to the dataset:
Note:
Goodreads Book Descriptions
A dataset of English book titles and descriptions from Goodreads. The original dataset has 2.3 million books total with many more fields. There may exist a small number of non-English books in this dataset.
Citations
Mengting Wan, Julian McAuley, "Item Recommendation on Monotonic Behavior Chains", in RecSys'18. Mengting Wan, Rishabh Misra, Ndapa Nakashole, Julian McAuley, "Fine-Grained Spoiler Detection from Large-Scale Review Corpora", in… See the full description on the dataset page: https://huggingface.co/datasets/booksouls/goodreads-book-descriptions.
This data was gathered as part of the data mining project for General Assembly Data Science Immersive course.
This data was acquired from Google Books store. Google API was used to acquire the data. Nine features were gathered for each book in the data set. the column names mostly are self explanatory nevertheless, it will be explained below.
I like to thank google for making a free available API for their services and websites. I also would love to acknowledge the effort of the web scraper extension developer, it is really nice and powerful tool for web scraping.
©2019 Google
Here is a story. you love reading books, and recently, you bought a book that you thought you liked. However, after reading half the book you still don't feel the enthusiasm and joy you expected. I think that machine learning algorithms might help solve such a problems.
Reading books remains a popular pastime for U.S. adults, with ** percent of respondents to a 2021 survey saying that they had read a book in any format within the last year. Despite online media formats now being the preferred option for many consumers when it comes to television, music, and gaming, print books are by far the most popular format among readers in the United States. Whilst almost double the share of adults now read audiobooks compared to 2011, only ** percent claimed to have read an audiobook in the last year compared to ** percent who said that they had read a print book. Book sales in the United States In 2020, bookstore sales in the United States amounted to **** billion U.S. dollars. Sales in 2019 and 2020 were the lowest recorded since the early *****, and the combined effect of the coronavirus outbreak, along with the growing appeal of online purchasing, will likely mean that bookstore sales will continue to drop. Bookstores tend to see most success in August, December, and January, and sales revenue often surpasses *********** U.S. dollars in those months each year. That said, monthly retail sales of bookstores in the U.S. are notably lower overall than in previous years and were particularly poor in spring 2020 as a result of national shutdowns to stem the spread of COVID-19. Influence of COVID-19 on reading habits The coronavirus pandemic led to increased media consumption in general, but not only among avid video and music streaming fans. Data from a survey in March 2020 revealed that ** percent of Millennials read more books due to the COVID-19 outbreak, making consumers in this group the most likely to have done so compared to ** percent of the total survey sample. Meanwhile, ** percent of Boomers said that their reading habits had not changed.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The dataset has been collected in the frame of the Prac1 of the subject Tipology and Data Life Cycle of the Master's Degree in Data Science of the Universitat Oberta de Catalunya (UOC).
The dataset contains 25 variables and 52478 records corresponding to books on the GoodReads Best Books Ever list (the larges list on the site).
Original code used to retrieve the dataset can be found on github repository: github.com/scostap/goodreads_bbe_dataset
The data was retrieved in two sets, the first 30000 books and then the remainig 22478. Dates were not parsed and reformated on the second chunk so publishDate and firstPublishDate are representet in a mm/dd/yyyy format for the first 30000 records and Month Day Year for the rest.
Book cover images can be optionally downloaded from the url in the 'coverImg' field. Python code for doing so and an example can be found on the github repo.
The 25 fields of the dataset are:
| Attributes | Definition | Completeness |
| ------------- | ------------- | ------------- |
| bookId | Book Identifier as in goodreads.com | 100 |
| title | Book title | 100 |
| series | Series Name | 45 |
| author | Book's Author | 100 |
| rating | Global goodreads rating | 100 |
| description | Book's description | 97 |
| language | Book's language | 93 |
| isbn | Book's ISBN | 92 |
| genres | Book's genres | 91 |
| characters | Main characters | 26 |
| bookFormat | Type of binding | 97 |
| edition | Type of edition (ex. Anniversary Edition) | 9 |
| pages | Number of pages | 96 |
| publisher | Editorial | 93 |
| publishDate | publication date | 98 |
| firstPublishDate | Publication date of first edition | 59 |
| awards | List of awards | 20 |
| numRatings | Number of total ratings | 100 |
| ratingsByStars | Number of ratings by stars | 97 |
| likedPercent | Derived field, percent of ratings over 2 starts (as in GoodReads) | 99 |
| setting | Story setting | 22 |
| coverImg | URL to cover image | 99 |
| bbeScore | Score in Best Books Ever list | 100 |
| bbeVotes | Number of votes in Best Books Ever list | 100 |
| price | Book's price (extracted from Iberlibro) | 73 |
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Pen And Book is a dataset for object detection tasks - it contains Pen Book annotations for 578 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about books. It has 129 rows and is filtered where the book publisher is National Book League. It features 7 columns including author, publication date, language, and book publisher.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for Book_Title_BUID
This dataset consists of book titles with corresponding unique identifiers (UIDs) that can be used as labels. It was created to support projects requiring a standardized way of referencing books by both their titles and unique labels. The dataset is intended for use in classification tasks, document processing, and training AI models where accurate identification of books is necessary. This dataset card aims to be a base template for new datasets. It… See the full description on the dataset page: https://huggingface.co/datasets/R3troR0b/book-title_BUID.
Because of the sheer number of products available, the German book market is one of the largest business trading today. In order to display a highly individual profile to customers and, at the same time, keep the effort involved in selecting and ordering as low as possible, the key to success for the bookshop therefore lies in the effective purchasing from a choice of roughly 96,000 new titles each year. The challenge for the bookseller is to buy the right amount of the right books at the right time.
It is with this in mind that this year’s DATA MINING CUP Competition will be held in cooperation with Libri, Germany’s leading book wholesaler. Among Libri’s many successful support measures for booksellers, purchase recommendations give the bookshop a competitive advantage. Accordingly, the DATA MINING CUP 2009 challenge will be to forecast of purchase quantities of a clearly defined title portfolio per location, using simulated data.
The task of the DATA MINING CUP Competition 2009 is to forecast purchase quantities for 8 titles for 2,418 different locations. In order to create the model, simulated purchase data from an additional 2,394 locations will be supplied. All data refers to a fixed period of time. The object is to forecast the purchase quantities of these 8 different titles for the 2,418 locations as exactly as possible.
There are two text files available to assist in solving the problem: dmc2009_train.txt (train data file) and dmc2009_forecast.txt (data of 2,418 locations for whom a prediction is to be made).
This data is publicly available in the data-mining-website.
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Dataset Card for llm-book/ner-wikipedia-dataset
書籍『大規模言語モデル入門』で使用する、ストックマーク株式会社により作成された「Wikipediaを用いた日本語の固有表現抽出データセット」(Version 2.0)です。 Githubリポジトリstockmarkteam/ner-wikipedia-datasetで公開されているデータセットを利用しています。
Citation
@inproceedings{omi-2021-wikipedia, title = "Wikipediaを用いた日本語の固有表現抽出のデータセットの構築", author = "近江 崇宏", booktitle = "言語処理学会第27回年次大会", year = "2021", url = "https://anlp.jp/proceedings/annual_meeting/2021/pdf_dir/P2-7.pdf", }
Licence… See the full description on the dataset page: https://huggingface.co/datasets/llm-book/ner-wikipedia-dataset.
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
According to our latest research, the AI-Generated Children’s Book market size reached USD 1.32 billion in 2024, demonstrating robust momentum driven by technological advancements and shifting consumer preferences. The market is expected to grow at a CAGR of 18.4% from 2025 to 2033, propelled by the increasing adoption of AI in creative content generation and personalized learning experiences. By 2033, the global market value is forecasted to reach USD 6.23 billion, reflecting the rapid integration of AI tools in educational and entertainment publishing for children. This growth is underpinned by rising demand for interactive, customized, and educational content, as well as the proliferation of digital platforms facilitating easy access to AI-powered children’s books.
The acceleration in the AI-Generated Children’s Book market is primarily attributed to the growing emphasis on personalized learning and the need for engaging educational content. AI technologies enable publishers and content creators to produce books tailored to individual reading levels, interests, and cultural backgrounds, significantly enhancing the learning experience for children. The ability of AI to generate diverse storylines, adapt language complexity, and incorporate interactive elements makes these books particularly appealing to both parents and educators. Furthermore, the integration of AI in book creation reduces production time and costs, allowing for a more agile response to changing educational standards and reader preferences. These factors collectively contribute to the market's rapid expansion, especially in regions with high digital literacy rates and a strong focus on educational innovation.
Another significant growth driver for the AI-Generated Children’s Book market is the increasing penetration of digital devices among young readers. The widespread availability of tablets, e-readers, and smartphones has transformed how children consume content, shifting the focus from traditional print to interactive digital formats. AI-generated books leverage multimedia elements such as audio narration, animations, and gamified activities, making reading more immersive and accessible. This digital transformation aligns with the preferences of tech-savvy parents who seek innovative educational tools for their children. Additionally, the flexibility of AI-generated content supports multilingual capabilities, enabling publishers to cater to diverse linguistic markets and expand their global reach.
The surge in demand for educational resources during and after the COVID-19 pandemic has further accelerated the adoption of AI-generated children’s books. With remote learning becoming a norm, schools and parents are increasingly turning to digital solutions that offer personalized and adaptive learning experiences. AI-generated books can be seamlessly integrated into online curricula, providing teachers with valuable tools to track student progress and customize assignments. This trend is particularly evident in markets with advanced educational infrastructure and strong government support for digital literacy initiatives. As a result, the AI-Generated Children’s Book market is poised for sustained growth, driven by the convergence of technological innovation, changing consumer behavior, and the evolving landscape of education.
Regionally, North America leads the AI-Generated Children’s Book market, accounting for over 37% of the global revenue in 2024, followed closely by Europe and Asia Pacific. The dominance of North America can be attributed to the presence of major technology companies, high digital adoption rates, and significant investments in EdTech. Europe’s market is bolstered by strong educational policies and multilingual populations, while Asia Pacific is emerging as a high-growth region due to its large youth demographic and increasing internet penetration. Latin America and the Middle East & Africa are also witnessing steady growth, supported by government initiatives to improve digital education and access to learning resources. These regional dynamics underscore the global appeal and scalability of AI-generated children’s books across diverse markets.
The Product Type segment within the AI-Generated Children’s Book market encompasses personalized books, educational books, storybooks, activity books, and other
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Phone Book is a dataset for object detection tasks - it contains Phones Books Phone annotations for 791 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
According to our latest research, the global online book services market size reached USD 23.7 billion in 2024, reflecting robust digital adoption across multiple verticals. The market is experiencing a steady expansion, with a CAGR of 8.9% anticipated from 2025 to 2033. By the end of the forecast period, the market is projected to attain a value of USD 50.1 billion. This growth is propelled by factors such as the increasing penetration of internet-enabled devices, evolving consumer preferences for digital content, and rising investments in digital education platforms. As per our latest research, the market is witnessing significant transformation, driven by technological advancements and the expanding reach of online book providers worldwide.
The primary growth driver for the online book services market is the widespread adoption of smartphones, tablets, and e-readers, which has fundamentally changed how consumers access and consume literary content. The convenience of instant access to a vast array of books, magazines, and journals has made digital platforms highly attractive, especially among younger demographics and urban populations. Furthermore, the integration of advanced features such as cloud libraries, personalized recommendations, and interactive content has enhanced user engagement and retention. The proliferation of high-speed internet and affordable data plans in both developed and emerging markets has further facilitated this shift toward digital reading, making online book services increasingly mainstream.
Another significant factor contributing to the growth of the online book services market is the evolution of content delivery models, particularly the rise of subscription-based services. Subscription models offer users unlimited access to extensive collections of e-books and audiobooks for a fixed monthly fee, making digital reading more economical and accessible. This model has gained widespread acceptance among both individual readers and institutional users, including schools, universities, and libraries. Additionally, the ongoing digital transformation in the education sector, accelerated by the COVID-19 pandemic, has led to a surge in demand for online textbooks and academic resources, further expanding the market's reach and relevance.
Technological innovation continues to play a pivotal role in shaping the online book services market. Enhanced user experiences through artificial intelligence-driven recommendations, voice-enabled navigation in audiobooks, and integration with smart home devices are redefining how content is consumed. The development of multilingual platforms and region-specific content libraries is helping service providers cater to diverse global audiences. Moreover, partnerships between publishers and online platforms are ensuring timely and exclusive releases, adding value for subscribers. As competition intensifies, leading players are investing in secure digital rights management and seamless cross-device synchronization to maintain user trust and loyalty.
In the evolving landscape of digital content consumption, Book Discovery Platforms have emerged as pivotal tools for readers and publishers alike. These platforms are designed to enhance the discoverability of books, leveraging sophisticated algorithms and user data to recommend titles that align with individual preferences. By offering personalized reading suggestions, these platforms not only enrich the user experience but also drive engagement and sales for publishers. As the online book services market continues to grow, the role of Book Discovery Platforms becomes increasingly significant, providing a competitive edge to service providers who integrate these technologies into their offerings. The seamless integration of discovery features into digital libraries ensures that readers can easily navigate vast collections, uncovering new authors and genres that they might not have encountered otherwise.
Regionally, the online book services market exhibits distinct growth patterns, with North America and Europe leading in terms of market share due to high digital literacy rates and established publishing ecosystems. However, the Asia Pacific region is emerging as the fastest-growing market, driven by a burgeoning middle class, increasing smartphone penetration, and government init
The Approved Drug Products with Therapeutic Equivalence (Orange Book or OB) is a list of drugs approved under Section 505 of the Federal Food, Drug and Cosmetic Act and provides consumers timely updates on these products. In addition to these products (fo
https://fred.stlouisfed.org/legal/#copyright-public-domainhttps://fred.stlouisfed.org/legal/#copyright-public-domain
Graph and download economic data for Breakdown of Revenue by Media Type: Books - Print Books for Book Publishers, All Establishments, Employer Firms (RPCMPBEF51113ALLEST) from 2013 to 2022 about book, printing, employer firms, accounting, revenue, establishments, services, and USA.
According to a survey held in the United States between March and April 2020, 70 percent of respondents said that they read print books the most, with 39 percent of those consumers preferring their books to be new.
The study was conducted as the U.S. went into lockdown to prevent the spread of the coronavirus, however although the virus certainly affected media consumption in the United States, what did not change was consumers' book preferences. Print has always been the most popular book format in the U.S. and figures on increased media consumption during the pandemic showed that even Gen Z, a generation famed for loving digital, were the most likely to be reading books more than usual during the outbreak.
Book consumption in the U.S.
Whilst printed newspapers and magazines have struggled to survive as digital formats grow ever more prevalent and appealing, when it comes to books U.S. consumers still have a clear preference for print. Annual survey data consistently shows that U.S. adults are far more likely to have read a print book in the last year than a digital version thereof, and whilst the popularity of digital books has increased, print remains the favorite.
As far as book buying goes, whilst the number of print books sold in the U.S. fluctuates each year, the figures remain relatively stable. Although unit sales have not surpassed 700 million since 2010, the number came close in 2018 and yearly sales from 2015 to 2019 were higher than the amount recorded in 2004.
According to the results of a survey held in late 2022, American book lovers and readers still read print books the most, with ** percent having read a paperback or hardcover book that year. Kindle and other e-books were an appealing option with ** percent having read one in 2022, whereas audiobooks were substantially less popular.
This explorer facilitates quickly filtering the State of Iowa Salary Book data. The salary book provides the name, gender, county or city of residence (when possible), official title, total salary received during each fiscal year, base salary for the employee, and traveling and subsistence expense reimbursed to state personnel beginning with Fiscal Year 2007.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This horizontal bar chart displays books by book using the aggregation count. The data is filtered where the book publisher is Book Ripple Publishing. The data is about books.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Vector store of embeddings for books
"1984" by George Orwell "The Almanac of Naval Ravikant" by Eric Jorgenson
This is a faiss vector store created with instructor embeddings using LangChain . Use it for similarity search, question answering or anything else that leverages embeddings! 😃 Creating these embeddings can take a while so here's a convenient, downloadable one 🤗
How to use
Specify the book from one of the following: "1984" "The Almanac of Naval… See the full description on the dataset page: https://huggingface.co/datasets/calmgoose/book-embeddings.