100+ datasets found
  1. Best Books Ever Dataset

    • zenodo.org
    csv
    Updated Nov 10, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lorena Casanova Lozano; Sergio Costa Planells; Lorena Casanova Lozano; Sergio Costa Planells (2020). Best Books Ever Dataset [Dataset]. http://doi.org/10.5281/zenodo.4265096
    Explore at:
    csvAvailable download formats
    Dataset updated
    Nov 10, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Lorena Casanova Lozano; Sergio Costa Planells; Lorena Casanova Lozano; Sergio Costa Planells
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    The dataset has been collected in the frame of the Prac1 of the subject Tipology and Data Life Cycle of the Master's Degree in Data Science of the Universitat Oberta de Catalunya (UOC).

    The dataset contains 25 variables and 52478 records corresponding to books on the GoodReads Best Books Ever list (the larges list on the site).

    Original code used to retrieve the dataset can be found on github repository: github.com/scostap/goodreads_bbe_dataset

    The data was retrieved in two sets, the first 30000 books and then the remainig 22478. Dates were not parsed and reformated on the second chunk so publishDate and firstPublishDate are representet in a mm/dd/yyyy format for the first 30000 records and Month Day Year for the rest.

    Book cover images can be optionally downloaded from the url in the 'coverImg' field. Python code for doing so and an example can be found on the github repo.

    The 25 fields of the dataset are:

    | Attributes | Definition | Completeness |
    | ------------- | ------------- | ------------- | 
    | bookId | Book Identifier as in goodreads.com | 100 |
    | title | Book title | 100 |
    | series | Series Name | 45 |
    | author | Book's Author | 100 |
    | rating | Global goodreads rating | 100 |
    | description | Book's description | 97 |
    | language | Book's language | 93 |
    | isbn | Book's ISBN | 92 |
    | genres | Book's genres | 91 |
    | characters | Main characters | 26 |
    | bookFormat | Type of binding | 97 |
    | edition | Type of edition (ex. Anniversary Edition) | 9 |
    | pages | Number of pages | 96 |
    | publisher | Editorial | 93 |
    | publishDate | publication date | 98 |
    | firstPublishDate | Publication date of first edition | 59 |
    | awards | List of awards | 20 |
    | numRatings | Number of total ratings | 100 |
    | ratingsByStars | Number of ratings by stars | 97 |
    | likedPercent | Derived field, percent of ratings over 2 starts (as in GoodReads) | 99 |
    | setting | Story setting | 22 |
    | coverImg | URL to cover image | 99 |
    | bbeScore | Score in Best Books Ever list | 100 |
    | bbeVotes | Number of votes in Best Books Ever list | 100 |
    | price | Book's price (extracted from Iberlibro) | 73 |

  2. Goodreads Book Reviews

    • kaggle.com
    Updated Oct 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ahmad (2023). Goodreads Book Reviews [Dataset]. https://www.kaggle.com/datasets/pypiahmad/goodreads-book-reviews1/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 30, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Ahmad
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    The Goodreads Book Reviews dataset encapsulates a wealth of reviews and various attributes concerning the books listed on the Goodreads platform. A distinguishing feature of this dataset is its capture of multiple tiers of user interaction, ranging from adding a book to a "shelf", to rating and reading it. This dataset is a treasure trove for those interested in understanding user behavior, book recommendations, sentiment analysis, and the interplay between various attributes of books and user interactions.

    Basic Statistics: - Items: 1,561,465 - Users: 808,749 - Interactions: 225,394,930

    Metadata: - Reviews: The text of the reviews provided by users. - Add-to-shelf, Read, Review Actions: Various interactions users have with the books. - Book Attributes: Attributes describing the books including title, and ISBN. - Graph of Similar Books: A graph depicting similarity relations between books.

    Example (interaction data): json { "user_id": "8842281e1d1347389f2ab93d60773d4d", "book_id": "130580", "review_id": "330f9c153c8d3347eb914c06b89c94da", "isRead": true, "rating": 4, "date_added": "Mon Aug 01 13:41:57 -0700 2011", "date_updated": "Mon Aug 01 13:42:41 -0700 2011", "read_at": "Fri Jan 01 00:00:00 -0800 1988", "started_at": "" }

    Use Cases: - Book Recommendations: Creating personalized book recommendations based on user interactions and preferences. - Sentiment Analysis: Analyzing sentiment in reviews and understanding how different book attributes influence sentiment. - User Behavior Analysis: Understanding user interaction patterns with books and deriving insights to enhance user engagement. - Natural Language Processing: Training models to process and analyze user-generated text in reviews. - Similarity Analysis: Analyzing the graph of similar books to understand book similarities and clustering.

    Citation: Please cite the following if you use the data: Item recommendation on monotonic behavior chains Mengting Wan, Julian McAuley RecSys, 2018 [PDF](https://cseweb.ucsd.edu/~jmcauley/pdfs/recsys18e.pdf)

    Code Samples: A curated set of code samples is provided in the dataset's Github repository, aiding in seamless interaction with the datasets. These include: - Downloading datasets without GUI: Facilitating dataset download in a non-GUI environment. - Displaying Sample Records: Showcasing sample records to get a glimpse of the dataset structure. - Calculating Basic Statistics: Computing basic statistics to understand the dataset's distribution and characteristics. - Exploring the Interaction Data: Delving into interaction data to grasp user-book interaction patterns. - Exploring the Review Data: Analyzing review data to extract valuable insights from user reviews.

    Additional Dataset: - Complete book reviews (~15m multilingual reviews about ~2m books and 465k users): This dataset comprises a comprehensive collection of reviews, showcasing a multilingual facet with reviews about around 2 million books from 465,000 users.

    Datasets:

    Meta-Data of Books:

    • Detailed Book Graph (goodreads_books.json.gz): A comprehensive graph detailing around 2.3 million books, acting as a rich source of book attributes and metadata.
    • Detailed Information of Authors (goodreads_book_authors.json.gz):
      • An extensive dataset containing detailed information about book authors, essential for understanding author-centric trends and insights.
      • Download Link
    • Detailed Information of Works (goodreads_book_works.json.gz):
      • This dataset provides abstract information about a book disregarding any particular editions, facilitating a high-level understanding of each work.
      • Download Link
    • Detailed Information of Book Series (goodreads_book_series.json.gz):
      • A dataset encompassing detailed information about book series, aiding in understanding series-related trends and insights. Note that the series id included here cannot be used for URL hack.
      • Download Link
    • Extracted Fuzzy Book Genres (goodreads_book_genres_initial.json....
  3. f

    Dataset: Books

    • figshare.com
    application/gzip
    Updated Jan 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SN SciGraph Team (2023). Dataset: Books [Dataset]. http://doi.org/10.6084/m9.figshare.7739084.v2
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jan 31, 2023
    Dataset provided by
    SN SciGraph
    Authors
    SN SciGraph Team
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    The books dataset includes information about all published books from Springer Nature.See also: https://scigraph.springernature.com/explorer/datasets/data_at_a_glance/A book record usually includes information about the chapters it contains, external identifiers, authors, editors and affiliations information, links to related grants, subjects and abstract when available.Version info:* http://scigraph.downloads.uberresearch.com/archives/current/TIMESTAMP.txt* http://scigraph.downloads.uberresearch.com/archives/current/LICENSE.txt

  4. F

    Breakdown of Revenue by Media Type: Books - Print Books for Book Publishers,...

    • fred.stlouisfed.org
    json
    Updated Jan 31, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Breakdown of Revenue by Media Type: Books - Print Books for Book Publishers, All Establishments, Employer Firms [Dataset]. https://fred.stlouisfed.org/series/RPCMPBEF51113ALLEST
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Jan 31, 2024
    License

    https://fred.stlouisfed.org/legal/#copyright-public-domainhttps://fred.stlouisfed.org/legal/#copyright-public-domain

    Description

    Graph and download economic data for Breakdown of Revenue by Media Type: Books - Print Books for Book Publishers, All Establishments, Employer Firms (RPCMPBEF51113ALLEST) from 2013 to 2022 about book, printing, employer firms, accounting, revenue, establishments, services, and USA.

  5. d

    Library New Titles - Large Print Books

    • catalog.data.gov
    • data.lacity.org
    Updated Dec 2, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.lacity.org (2020). Library New Titles - Large Print Books [Dataset]. https://catalog.data.gov/dataset/library-new-titles-large-print-books
    Explore at:
    Dataset updated
    Dec 2, 2020
    Dataset provided by
    data.lacity.org
    Description

    The latest titles in large-print format at LAPL, updated weekly.

  6. Number of book piracy downloads in the U.S. 2017, by method

    • statista.com
    Updated Mar 22, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2021). Number of book piracy downloads in the U.S. 2017, by method [Dataset]. https://www.statista.com/statistics/688228/book-piracy-download-number-method/
    Explore at:
    Dataset updated
    Mar 22, 2021
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2017
    Area covered
    United States
    Description

    The statistic presents data on the average number of pirated e-books downloaded per user in the past 12 months in the United States in 2017. Illegal downloaders obtained an average of 3.14 illegal e-books from a friend in the past 12 months.

  7. o

    Project Gutenberg Book Corpus

    • opendatabay.com
    .undefined
    Updated Jul 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Project Gutenberg Book Corpus [Dataset]. https://www.opendatabay.com/data/ai-ml/0979850d-7ed8-4aeb-887d-4ad585d2f661
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jul 3, 2025
    Dataset authored and provided by
    Datasimple
    Area covered
    Education & Learning Analytics
    Description

    This dataset is a collection of over 15,000 book texts, complete with their authors and titles. It has been compiled by scraping the Project Gutenberg website, specifically parsing its bookshelves. The dataset includes metadata such as titles, authors, categories (bookshelves), and download links for the book texts. Some books from Project Gutenberg are not included if they haven't been categorised. Notably, the dataset also retains audiobooks, offering flexibility for users interested in audio data alongside text.

    Columns

    The dataset primarily includes the following columns:

    • Title: The title of the book.
    • Author: The author of the book.
    • Link: The direct download link for the book's text.
    • Bookshelf: The category or genre assigned to the book on Project Gutenberg.
    • Text Data: The actual text content of the books, which can be downloaded using a provided script.

    Distribution

    The dataset's metadata is initially available in a gutenberg_metadata.csv file. The full text data for each book can be downloaded using a gutenberg_download.py script, which then saves the results into a CSV file. This final CSV file, containing the book texts, authors, titles, and categories, is approximately 5 GB in size. The corpus features more than 15,000 unique book texts.

    Usage

    This dataset is ideal for various applications in education and learning analytics. Specific use cases include:

    • Natural Language Processing (NLP) tasks, such as text analysis, topic modelling, and language understanding.
    • Literature studies and computational humanities research.
    • Developing and training AI and Machine Learning models on large text corpora.
    • Working with audio data, as some books are included as audiobooks.

    Coverage

    The dataset has a global region coverage, reflecting the diverse origins of books within Project Gutenberg. It focuses on books that have been categorised on the Project Gutenberg website; un-categorised books are not included. No specific time range or demographic scope is detailed in the available information.

    License

    CC-BY-SA

    Who Can Use It

    This dataset is suitable for:

    • Researchers and academics focusing on text analysis, literary studies, or digital humanities.
    • Data scientists and machine learning engineers building and testing NLP models.
    • Students undertaking projects in linguistics, computer science, or library science.
    • Developers creating applications that require a large corpus of literary texts.

    Dataset Name Suggestions

    • Project Gutenberg Book Corpus
    • Digital Literature Collection
    • Classic Book Text Dataset
    • Historical Text Library

    Attributes

    Original Data Source: 15000 Gutenberg Books

  8. R

    Data from: Book Reading Dataset

    • universe.roboflow.com
    zip
    Updated May 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    tim (2024). Book Reading Dataset [Dataset]. https://universe.roboflow.com/tim-4ijf0/book-reading
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 4, 2024
    Dataset authored and provided by
    tim
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Open Bounding Boxes
    Description

    Book Reading

    ## Overview
    
    Book Reading is a dataset for object detection tasks - it contains Open annotations for 357 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  9. F

    Retail Sales: Book Stores

    • fred.stlouisfed.org
    json
    Updated Jun 17, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Retail Sales: Book Stores [Dataset]. https://fred.stlouisfed.org/series/MRTSSM451211USN
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Jun 17, 2025
    License

    https://fred.stlouisfed.org/legal/#copyright-public-domainhttps://fred.stlouisfed.org/legal/#copyright-public-domain

    Description

    Graph and download economic data for Retail Sales: Book Stores (MRTSSM451211USN) from Jan 1992 to Apr 2025 about book, retail trade, sales, retail, and USA.

  10. o

    Amazon Bestselling Books & Customer Reviews

    • opendatabay.com
    .undefined
    Updated Jul 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Amazon Bestselling Books & Customer Reviews [Dataset]. https://www.opendatabay.com/data/ai-ml/1639fb85-1580-4646-8216-326b2fac3437
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jul 2, 2025
    Dataset authored and provided by
    Datasimple
    Area covered
    Reviews & Ratings
    Description

    This dataset provides an in-depth look into Amazon's top 100 bestselling books along with their customer reviews, ratings, and pricing information. It offers a window into the world of popular reading and customer sentiment. The dataset was collected in November 2023, making it suitable for analysing recent literary trends and consumer behaviour.

    Columns

    The dataset includes the following fields: * Book Rank: The ranking of the book among the top 100 bestselling books on Amazon. * Book Title: The title of the book. Examples include "The Ballad of Songbirds and Snakes" and "Iron Flame". * Price: The price of the book in USD. * Rating: The overall rating of the book, on a scale of 1 to 5. * Author: The author of the book. Notable authors include Sarah J. Maas and Adam Wallace. * Year of Publication: The year in which the book was published. * Genre: The category to which the book belongs. Popular genres include Nonfiction and Childrens, literature. * URL: The direct URL link to the book on Amazon's platform. * Review Title: The title of the customer review. * Reviewer: The name of the person who wrote the review. * Reviewer Rating: The rating given by the reviewer for the book, on a scale of 1 to 5. * Review Description: The textual content of the review. * Is_verified: Indicates whether the review is a verified customer purchase. * Date: The date when the review was posted. * Timestamp: The timestamp indicating when the review was posted. * ASIN: Amazon Standard Identification Number assigned to products on Amazon.

    Distribution

    The dataset focuses on the top 100 bestselling books. * Price: Book prices range from 1.00 USD to 100.00 USD. There are approximately 10 books within each 9.90 USD price band across this range. * Rating: Overall book ratings are generally high, ranging from 4.10 to 5.00. A notable number of books have ratings between 4.73 and 4.82. * Year of Publication: Books in the dataset were published between 1947 and 2024. A significant portion, 64 books, were published between 2016 and 2024, indicating a strong presence of recent titles. * Genre: While diverse, Nonfiction and Childrens, literature are among the more prominent genres. * Authors/Titles: "The Ballad of Songbirds and Snakes" and "Iron Flame" are among the top-ranked titles. Sarah J. Maas and Adam Wallace are featured authors. The dataset covers review data for each of the top 100 books, though the exact number of reviews per book is not specified.

    Usage

    This dataset is ideal for: * Market analysis: Identifying bestselling trends, pricing strategies, and popular authors. * Sentiment analysis: Analysing customer reviews to understand public perception and extract insights. * Recommender systems: Building or improving book recommendation engines. * Natural Language Processing (NLP): Training models for text classification, entity recognition, or summarisation based on review content. * Data visualisation: Creating visualisations of literary trends, rating distributions, or reviewer behaviour.

    Coverage

    • Geographic Scope: The data pertains to the global Amazon marketplace.
    • Time Range: Book publication years span from 1947 to 2024. Review data was collected up to November 2023.

    License

    CC-BY

    Who Can Use It

    • Data scientists and analysts: For machine learning projects, statistical analysis, and predictive modelling.
    • Book enthusiasts and literary researchers: To explore popular reading habits and genre trends.
    • Publishers and authors: To gain insights into market demand and reader feedback.
    • Students and educators: For academic projects related to data science, literature, or consumer studies.

    Dataset Name Suggestions

    • Amazon Bestselling Books & Customer Reviews
    • Top 100 Amazon Books Data 2023
    • Amazon Literary Trends Dataset
    • Bestselling Book Reviews on Amazon

    Attributes

    Original Data Source: Top 100 Bestselling Book Reviews on Amazon

  11. Frequency of e-book downloading in the UK 2015-2022

    • statista.com
    • ai-chatbox.pro
    Updated Dec 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Frequency of e-book downloading in the UK 2015-2022 [Dataset]. https://www.statista.com/statistics/291124/ebook-downloading-in-the-uk-by-frequency/
    Explore at:
    Dataset updated
    Dec 10, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    United Kingdom
    Description

    Data on e-book downloading among internet users in the United Kingdom found that in 2022, a total of 18 percent of respondents had downloaded an e-book in the three months running to the survey, the same as in the previous year. Despite this, the most popular way of accessing e-books remains purchasing rather than downloading or sharing.

  12. f

    Shadow library book downloads, time, location, ISBN, title

    • uvaauas.figshare.com
    zip
    Updated Dec 4, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    B. Bodó; Daniel Antal; Zoltán Puha (2020). Shadow library book downloads, time, location, ISBN, title [Dataset]. http://doi.org/10.21942/uva.12330959.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 4, 2020
    Dataset provided by
    University of Amsterdam / Amsterdam University of Applied Sciences
    Authors
    B. Bodó; Daniel Antal; Zoltán Puha
    License

    https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html

    Description

    Weblog dataset from a scholarly shadow library. Comma separated file, zippedFields:date - Timestamp when the book was downloadedlat - Latitude redacted to 3 decimalslong - Longitude redacted to 4 decimalscity - City of downloadcountry - Country of downloadisbn - ISBN number of the book downloadedtitle - Title of the book downloaded

  13. P

    BookCorpus Dataset

    • paperswithcode.com
    • opendatalab.com
    Updated Dec 19, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yukun Zhu; Ryan Kiros; Richard Zemel; Ruslan Salakhutdinov; Raquel Urtasun; Antonio Torralba; Sanja Fidler (2021). BookCorpus Dataset [Dataset]. https://paperswithcode.com/dataset/bookcorpus
    Explore at:
    Dataset updated
    Dec 19, 2021
    Authors
    Yukun Zhu; Ryan Kiros; Richard Zemel; Ruslan Salakhutdinov; Raquel Urtasun; Antonio Torralba; Sanja Fidler
    Description

    BookCorpus is a large collection of free novel books written by unpublished authors, which contains 11,038 books (around 74M sentences and 1G words) of 16 different sub-genres (e.g., Romance, Historical, Adventure, etc.).

  14. p

    Book Publishers in United States - 5,599 Verified Listings Database

    • poidata.io
    csv, excel, json
    Updated Jul 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Poidata.io (2025). Book Publishers in United States - 5,599 Verified Listings Database [Dataset]. https://www.poidata.io/report/book-publisher/united-states
    Explore at:
    json, csv, excelAvailable download formats
    Dataset updated
    Jul 7, 2025
    Dataset provided by
    Poidata.io
    Area covered
    United States
    Description

    Comprehensive dataset of 5,599 Book publishers in United States as of July, 2025. Includes verified contact information (email, phone), geocoded addresses, customer ratings, reviews, business categories, and operational details. Perfect for market research, lead generation, competitive analysis, and business intelligence. Download a complimentary sample to evaluate data quality and completeness.

  15. Nasdaq Stock Market Data (Nasdaq TotalView-ITCH feed)

    • databento.com
    csv, dbn, json
    Updated Jan 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Databento (2025). Nasdaq Stock Market Data (Nasdaq TotalView-ITCH feed) [Dataset]. https://databento.com/datasets/XNAS.ITCH
    Explore at:
    dbn, json, csvAvailable download formats
    Dataset updated
    Jan 14, 2025
    Dataset provided by
    Databento Inc.
    Authors
    Databento
    Time period covered
    May 1, 2018 - Present
    Area covered
    United States
    Description

    Get Nasdaq real-time and historical data with support for fast market replay at over 19 million book updates per second. Test our data for free with only 4 lines of code.

    Nasdaq TotalView-ITCH is a proprietary data feed that disseminates full order book depth and last sale data from the Nasdaq stock market (XNAS). It delivers every quote and order at each price level, along with any event that updates the order book after an order is placed, such as trade executions, modifications, or cancellations. Nasdaq is the most active US equity exchange by volume and represented 13.03% of the average daily volume (ADV) as of January 2025.

    With its L3 granularity, Nasdaq TotalView-ITCH captures information beyond the L1, top-of-book data available through SIP feeds and enables more accurate modeling of book imbalances, trade directionality, quote lifetimes, and more. This includes explicit trade aggressor side, odd lots, auction imbalance data, and the Net Order Imbalance Indicator (NOII) for the Nasdaq Opening and Closing Crosses and Nasdaq IPO/Halt Cross—the best predictor of Nasdaq opening and closing prices available. Other key advantages of Nasdaq TotalView-ITCH over SIP data include faster real-time dissemination and precise exchange-side timestamping directly from Nasdaq.

    Real-time Nasdaq TotalView-ITCH data is included with a Plus or Unlimited subscription through our Databento US Equities service. Historical data is available for usage-based rates or with any subscription. Visit our pricing page for more details or to upgrade your plan.

    Breadth of coverage: 20,329 products

    Asset class(es): Equities

    Origin: Directly captured at Equinix NY4 (Secaucus, NJ) with an FPGA-based network card and hardware timestamping. Synchronized to UTC with PTP.

    Supported data encodings: DBN, CSV, JSON Learn more

    Supported market data schemas: MBO, MBP-1, MBP-10, BBO-1s, BBO-1m, TBBO, Trades, OHLCV-1s, OHLCV-1m, OHLCV-1h, OHLCV-1d, Definition, Statistics, Status, Imbalance Learn more

    Resolution: Immediate publication, nanosecond-resolution timestamps

  16. h

    opus_books

    • huggingface.co
    Updated Mar 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Language Technology Research Group at the University of Helsinki (2024). opus_books [Dataset]. https://huggingface.co/datasets/Helsinki-NLP/opus_books
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 29, 2024
    Dataset authored and provided by
    Language Technology Research Group at the University of Helsinki
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    Dataset Card for OPUS Books

      Dataset Summary
    

    This is a collection of copyright free books aligned by Andras Farkas, which are available from http://www.farkastranslations.com/bilingual_books.php Note that the texts are rather dated due to copyright issues and that some of them are manually reviewed (check the meta-data at the top of the corpus files in XML). The source is multilingually aligned, which is available from http://www.farkastranslations.com/bilingual_books.php.… See the full description on the dataset page: https://huggingface.co/datasets/Helsinki-NLP/opus_books.

  17. Audible Dataset

    • kaggle.com
    Updated Apr 11, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Snehangsu De (2022). Audible Dataset [Dataset]. https://www.kaggle.com/datasets/snehangsude/audible-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 11, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Snehangsu De
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Introduction

    With the trend toward audiobooks growing, I gathered this data to understand how the audiobook market has been growing over the years. From authors of audiobooks to release dates, the data represents the important details of audiobooks from 1998 till 2025 (pre-planned releases).

    I have yet to find a great audiobooks dataset and hence the urge to make a dataset that provides us with information on the basics and the history of audiobooks. I look to improve the dataset with more details in the near future.

    File Information

    The Uncleaned data or audible_uncleaned.csv is exactly the raw data I derived from Audible.in The Cleaned one or audible_cleaned.csv consists of a few basic data cleaning steps.

    Libraries used

    The data was collected using webs-scraping. - re - Beautiful Soup - Selenium

    Beautiful Soup and Selenium were used in unison to mainly gather the data. The code can be re-used and you can find the code here: https://github.com/snehangsude/audible_scraper

    Column Breakdown

    • name: Name of the audiobook
    • author: Author of the audiobook
    • narrator: Narrator of the audiobook
    • time: Length of the audiobook
    • releasedate: Release date of the audiobook
    • language: Language of the audiobook
    • stars: No. of stars the audiobook received
    • price: Price of the audiobook in INR
    • ratings: No. of reviews received by the audiobook
  18. p

    Books in Thailand - 1 Verified Listings Database

    • poidata.io
    csv, excel, json
    Updated Jul 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Poidata.io (2025). Books in Thailand - 1 Verified Listings Database [Dataset]. https://www.poidata.io/report/books/thailand
    Explore at:
    excel, json, csvAvailable download formats
    Dataset updated
    Jul 4, 2025
    Dataset provided by
    Poidata.io
    Area covered
    Thailand
    Description

    Comprehensive dataset of 1 Books in Thailand as of July, 2025. Includes verified contact information (email, phone), geocoded addresses, customer ratings, reviews, business categories, and operational details. Perfect for market research, lead generation, competitive analysis, and business intelligence. Download a complimentary sample to evaluate data quality and completeness.

  19. D

    Evaluating the impact of the FWF-E-Book-Library collection in the OAPEN...

    • ssh.datastations.nl
    ods, pdf, tsv, zip
    Updated Mar 25, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    R. Snijder; R. Snijder (2015). Evaluating the impact of the FWF-E-Book-Library collection in the OAPEN Library [Dataset]. http://doi.org/10.17026/DANS-ZM7-X6E9
    Explore at:
    ods(1085463), zip(15133), pdf(1453317), tsv(12818)Available download formats
    Dataset updated
    Mar 25, 2015
    Dataset provided by
    DANS Data Station Social Sciences and Humanities
    Authors
    R. Snijder; R. Snijder
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Measuring scholarly impact and societal relevance in the humanities and social sciences can be done in several ways. Here we will look at a collection of e-books from the FWF-E-Book-Library, which is made available through the OAPEN Library. In 2014, 146 books of the FWF-E-Book-Library collection were made available via the OAPEN Library.The analysis is based on COUNTER compliant download data. This means that downloads by automated systems ('bots') and other types of suspicious download behaviour is discarded from the reports. The data of the 28,139 downloads used for this analysis originated from 23,652 IP addresses. It is clear that many providers use several IP addresses: the IP addresses were linked to 2,839 provider names. Where no information about a provider could be found, the download data was omitted.

  20. F

    Book Publication, Editions for United States

    • fred.stlouisfed.org
    json
    Updated Aug 15, 2012
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2012). Book Publication, Editions for United States [Dataset]. https://fred.stlouisfed.org/series/M0106AUSM234NNBR
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Aug 15, 2012
    License

    https://fred.stlouisfed.org/legal/#copyright-citation-requiredhttps://fred.stlouisfed.org/legal/#copyright-citation-required

    Area covered
    United States
    Description

    Graph and download economic data for Book Publication, Editions for United States (M0106AUSM234NNBR) from Jan 1913 to Dec 1928 about book and USA.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Lorena Casanova Lozano; Sergio Costa Planells; Lorena Casanova Lozano; Sergio Costa Planells (2020). Best Books Ever Dataset [Dataset]. http://doi.org/10.5281/zenodo.4265096
Organization logo

Best Books Ever Dataset

Explore at:
3 scholarly articles cite this dataset (View in Google Scholar)
csvAvailable download formats
Dataset updated
Nov 10, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Lorena Casanova Lozano; Sergio Costa Planells; Lorena Casanova Lozano; Sergio Costa Planells
License

Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically

Description

The dataset has been collected in the frame of the Prac1 of the subject Tipology and Data Life Cycle of the Master's Degree in Data Science of the Universitat Oberta de Catalunya (UOC).

The dataset contains 25 variables and 52478 records corresponding to books on the GoodReads Best Books Ever list (the larges list on the site).

Original code used to retrieve the dataset can be found on github repository: github.com/scostap/goodreads_bbe_dataset

The data was retrieved in two sets, the first 30000 books and then the remainig 22478. Dates were not parsed and reformated on the second chunk so publishDate and firstPublishDate are representet in a mm/dd/yyyy format for the first 30000 records and Month Day Year for the rest.

Book cover images can be optionally downloaded from the url in the 'coverImg' field. Python code for doing so and an example can be found on the github repo.

The 25 fields of the dataset are:

| Attributes | Definition | Completeness |
| ------------- | ------------- | ------------- | 
| bookId | Book Identifier as in goodreads.com | 100 |
| title | Book title | 100 |
| series | Series Name | 45 |
| author | Book's Author | 100 |
| rating | Global goodreads rating | 100 |
| description | Book's description | 97 |
| language | Book's language | 93 |
| isbn | Book's ISBN | 92 |
| genres | Book's genres | 91 |
| characters | Main characters | 26 |
| bookFormat | Type of binding | 97 |
| edition | Type of edition (ex. Anniversary Edition) | 9 |
| pages | Number of pages | 96 |
| publisher | Editorial | 93 |
| publishDate | publication date | 98 |
| firstPublishDate | Publication date of first edition | 59 |
| awards | List of awards | 20 |
| numRatings | Number of total ratings | 100 |
| ratingsByStars | Number of ratings by stars | 97 |
| likedPercent | Derived field, percent of ratings over 2 starts (as in GoodReads) | 99 |
| setting | Story setting | 22 |
| coverImg | URL to cover image | 99 |
| bbeScore | Score in Best Books Ever list | 100 |
| bbeVotes | Number of votes in Best Books Ever list | 100 |
| price | Book's price (extracted from Iberlibro) | 73 |

Search
Clear search
Close search
Google apps
Main menu