17 datasets found

Books dataset, ISBN based
kaggle.com
zip
Updated Oct 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Goulven Furet (2025). Books dataset, ISBN based [Dataset]. https://www.kaggle.com/datasets/goulvenfuret/books-dataset-isbn-based
Explore at:
zip(367961043 bytes)Available download formats
Dataset updated
Oct 13, 2025
Authors
Goulven Furet
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Nudger is a responsible price comparison tool registered as a public good. This project follows an open-source and open-data approach, featuring open datasets on books and products that are accessible to everyone.

The data shared by Nudger primarily covers the French market.

ISBN Dataset: Contains information on over 6 million books identified by their ISBN numbers.

Nudger is an open and growing project. Feel free to contact us with any questions!
Best Books Ever Dataset
zenodo.org
csv
Updated Nov 10, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lorena Casanova Lozano; Sergio Costa Planells; Lorena Casanova Lozano; Sergio Costa Planells (2020). Best Books Ever Dataset [Dataset]. http://doi.org/10.5281/zenodo.4265096
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4265096
Dataset updated
Nov 10, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Lorena Casanova Lozano; Sergio Costa Planells; Lorena Casanova Lozano; Sergio Costa Planells
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
The dataset has been collected in the frame of the Prac1 of the subject Tipology and Data Life Cycle of the Master's Degree in Data Science of the Universitat Oberta de Catalunya (UOC).

The dataset contains 25 variables and 52478 records corresponding to books on the GoodReads Best Books Ever list (the larges list on the site).

Original code used to retrieve the dataset can be found on github repository: github.com/scostap/goodreads_bbe_dataset

The data was retrieved in two sets, the first 30000 books and then the remainig 22478. Dates were not parsed and reformated on the second chunk so publishDate and firstPublishDate are representet in a mm/dd/yyyy format for the first 30000 records and Month Day Year for the rest.

Book cover images can be optionally downloaded from the url in the 'coverImg' field. Python code for doing so and an example can be found on the github repo.

The 25 fields of the dataset are:

| Attributes | Definition | Completeness | | ------------- | ------------- | ------------- | | bookId | Book Identifier as in goodreads.com | 100 | | title | Book title | 100 | | series | Series Name | 45 | | author | Book's Author | 100 | | rating | Global goodreads rating | 100 | | description | Book's description | 97 | | language | Book's language | 93 | | isbn | Book's ISBN | 92 | | genres | Book's genres | 91 | | characters | Main characters | 26 | | bookFormat | Type of binding | 97 | | edition | Type of edition (ex. Anniversary Edition) | 9 | | pages | Number of pages | 96 | | publisher | Editorial | 93 | | publishDate | publication date | 98 | | firstPublishDate | Publication date of first edition | 59 | | awards | List of awards | 20 | | numRatings | Number of total ratings | 100 | | ratingsByStars | Number of ratings by stars | 97 | | likedPercent | Derived field, percent of ratings over 2 starts (as in GoodReads) | 99 | | setting | Story setting | 22 | | coverImg | URL to cover image | 99 | | bbeScore | Score in Best Books Ever list | 100 | | bbeVotes | Number of votes in Best Books Ever list | 100 | | price | Book's price (extracted from Iberlibro) | 73 |
Books Dataset
kaggle.com
zip
Updated Dec 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Elvin Rustamov (2023). Books Dataset [Dataset]. https://www.kaggle.com/datasets/elvinrustam/books-dataset
Explore at:
zip(55469565 bytes)Available download formats
Dataset updated
Dec 20, 2023
Authors
Elvin Rustamov
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Overview: This dataset comprises information scraped from wonderbk.com, a popular online bookstore. The dataset contains details of 103,063 books, with key attributes such as title, authors, description, category, publisher, starting price, and publish date.

Columns:

Title: The title of the book.

Authors: The authors of the book.

Description: A brief description of the book.

Category: The category or genre to which the book belongs.

Publisher: The publishing house responsible for the book.

Price Starting With ($): The initial price of the book.

Publish Date (Month): The month in which the book was published.

Publish Date (Year): The year of publication.
Amazon Books details for computer science
kaggle.com
zip
Updated Sep 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Muhammad Uzair Khan (2023). Amazon Books details for computer science [Dataset]. https://www.kaggle.com/datasets/uzair01/amazon-books
Explore at:
zip(30197 bytes)Available download formats
Dataset updated
Sep 26, 2023
Authors
Muhammad Uzair Khan
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Context: The "Amazon Books Dataset" is a meticulously curated collection of data related to books available on the Amazon platform, with a primary focus on computer science literature. This dataset holds significant importance for researchers, industry experts, and enthusiasts who are interested in gaining profound insights into the world of computer science literature, publishing trends, and the broader book market. Computer science is at the forefront of technological innovation and is a driving force in shaping the modern world. Understanding the dynamics of the Amazon book market, particularly in the realm of computer science, is essential for various stakeholders in the publishing, technology, and retail industries.

Data Source: The data for this dataset has been meticulously sourced and compiled from Amazon's extensive catalog of books, with a specific emphasis on books related to computer science. The information is current as of the dataset's creation date and may continue to evolve as new books are published and existing ones undergo changes in attributes such as ratings and prices.

Content: The "Amazon Books Catalog Dataset" provides a rich and comprehensive array of data points for each book listed on Amazon, with a primary focus on computer science literature. These data points empower users to conduct in-depth analyses and uncover valuable insights. Here are the key columns included in this dataset:

Title: The title of the book, represents its core identity.

Description: A concise yet informative overview of the book's content and themes.

Author: The name(s) of the author(s) responsible for creating the literary work.

ISBN-10: The International Standard Book Number in the 10-digit format, aiding in precise identification.

ISBN-13: The International Standard Book Number in the 13-digit format, enhancing global recognition.

Publish Date: The date on which the book was officially published, marking its entry into the literary world.

Edition: Information about the edition of the book, if applicable, providing insights into its various releases.

Best Seller: A binary indicator (1 or 0) signifying whether the book has achieved best-seller status.

Top Rated: A binary indicator (1 or 0) highlighting whether the book has received high ratings and acclaim from readers.

Rating: The average rating awarded to the book by Amazon customers, offering an assessment of its quality.

Review Count: The total number of user reviews posted for the book, reflecting reader engagement and feedback.

Price: The current price of the book on the Amazon platform, influences purchase decisions.

Potential Use Cases: This dataset, with its focus on computer science literature, presents a unique opportunity for research and analysis in fields such as:

Technology Trends: Analyzing emerging trends in computer science literature, including topics, technologies, and subfields.

Educational Insights: Investigating the popularity of computer science textbooks and learning materials.

Author Impact: Assessing the influence and contributions of authors within the computer science domain.

Market Analysis: Understanding pricing dynamics and factors influencing the success of computer science books.

Researchers, analysts, and computer science enthusiasts can leverage this dataset to explore the intricate world of computer science literature, uncover market trends, and contribute to a deeper understanding of the evolving landscape of technology and knowledge dissemination on one of the world's largest online marketplaces.
Z
Crossref metadata of COCI bibliographic resources, as of November 2018 and...
data-staging.niaid.nih.gov
data.niaid.nih.gov
+1more
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhu, Yongjun; Yan, Erjia; Peroni, Silvio; Che, Chao (2020). Crossref metadata of COCI bibliographic resources, as of November 2018 and LCC categories of the ISBN entities in the dataset [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_3241744
Explore at:
Dataset updated
Jan 24, 2020
Dataset provided by
College of Computing and Informatics, Drexel University, 3141 Chestnut Street, Philadelphia, PA 19104, USA
Digital Humanities Advanced Research Centre (DHARC), Department of Classical Philology and Italian Studies, University of Bologna, Via Zamboni 32, 40126 Bologna, Italy
Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, Dalian University, Dalian, China
Department of Library and Information Science, Sungkyunkwan University, 25-2, Sungkyunkwan-ro, Jongno-gu, Seoul, Republic of Korea
Authors
Zhu, Yongjun; Yan, Erjia; Peroni, Silvio; Che, Chao
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The all.zip CSV file (zipped) contains citation counts obtained from the November 2018 dump of COCI (https://doi.org/10.6084/m9.figshare.6741422.v3) and some metadata (title, DOI, number of authors, ISBN, ISBN of the container, type of the bibliographic resource) of the related citing and cited entities obtained by using the Crossref dump downloaded in October 2018 – which is the same dump used to create the COCI data.

In addition, it contains all the Library of Congress Classification (LCC) categories associated with each ISBN in the previous dataset (file isbn_cat_lcc.csv), according to the data retrieved using the services at http://classify.oclc.org/classify2/api_docs/index.html. Two ancillary mapping files have been also added: one (ddc_to_lcc_mapping.csv) for converting a Dewey Decimal Classification (DDC) categories into LCC categories, in the case the service mentioned above returned only DDC categories for some ISBN; the other (lcc_to_wos_mapping.csv) to map each LCC category into the related Web of Science research area.
Book covers dataset
kaggle.com
zip
Updated Jan 3, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Luka Anicin (2020). Book covers dataset [Dataset]. https://www.kaggle.com/datasets/lukaanicin/book-covers-dataset/discussion
Explore at:
zip(288287453 bytes)Available download formats
Dataset updated
Jan 3, 2020
Authors
Luka Anicin
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

A few months ago, I started working on a side project that allows users to search for books by taking a picture of a book cover. The main barrier at the beginning was data quality of book cover datasets available online, so I created this one.

For example, this dataset can be used for building recommendation and Content Based Image Retrieval (CBIR) systems.

Content

main_dataset.csv

This CSV file contains all meta information for each book in the dataset.

image - URLs of book covers. Use this cover to download images yourselves if you need.

name - Title of a book.

author - Author of a book.

format - Physical format of a book (i.e. paperback)

book_depository_stars - Book's rating found on the bookdepository.com(NOTE: Due to difference between scraping and download time of the dataset, this information might be different from one on the website)

price - Book's current price found on the bookdepository.com (NOTE: Due to difference between scraping and download time of the dataset, this information might be different from one on the website)

currency - Currency of prices found in the dataset.

old_price -Book's old price (if exists) found on the bookdepository.com (NOTE: Due to difference between scraping and download time of the dataset, this information might be different from one on the website)

isbn -ISBN number of a book.

category -Category of a book found on the bookdepository.com

img_paths -Book's cover local path (after scraping).

book-covers

In this folder you can find all book covers, sorted in category based folders, in the .jpg format. This dataset is contains 33 classes (book categories) and each contains close to 1k images, so it is pretty balanced. NOTE: Extract this data into folder called dataset so it matches paths provided for you in the main_dataset.csv file.

Acknowledgements

All data found in this dataset was scraped from the https://www.bookdepository.com/. (Not related to them in any way, just a great website 👍 )-
d
Open e-commerce 1.0: Five years of crowdsourced U.S. Amazon purchase...
search.dataone.org
Updated Dec 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alex Berke; Dan Calacci; Robert Mahari; Takahiro Yabe; Kent Larson; Sandy Pentland (2023). Open e-commerce 1.0: Five years of crowdsourced U.S. Amazon purchase histories with user demographics [Dataset]. http://doi.org/10.7910/DVN/YGLYDY
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/YGLYDY
Dataset updated
Dec 16, 2023
Dataset provided by
Harvard Dataverse
Authors
Alex Berke; Dan Calacci; Robert Mahari; Takahiro Yabe; Kent Larson; Sandy Pentland
Description
This dataset contains longitudinal purchases data from 5027 Amazon.com users in the US, spanning 2018 through 2022: amazon-purchases.csv It also includes demographic data and other consumer level variables for each user with data in the dataset. These consumer level variables were collected through an online survey and are included in survey.csv fields.csv describes the columns in the survey.csv file, where fields/survey columns correspond to survey questions. The dataset also contains the survey instrument used to collect the data. More details about the survey questions and possible responses, and the format in which they were presented can be found by viewing the survey instrument. A 'Survey ResponseID' column is present in both the amazon-purchases.csv and survey.csv files. It links a user's survey responses to their Amazon.com purchases. The 'Survey ResponseID' was randomly generated at the time of data collection. amazon-purchases.csv Each row in this file corresponds to an Amazon order. Each such row has the following columns: Survey ResponseID Order date Shipping address state Purchase price per unit Quantity ASIN/ISBN (Product Code) Title Category The data were exported by the Amazon users from Amazon.com and shared by users with their informed consent. PII and other information not listed above were stripped from the data. This processing occurred on users' machines before sharing with researchers.
d
BigBox API | Home Depot Product & Search Results Data
datarade.ai
.json, .csv
Updated Nov 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Traject Data (2022). BigBox API | Home Depot Product & Search Results Data [Dataset]. https://datarade.ai/data-products/bigbox-api-home-depot-product-search-results-data-traject-data
Explore at:
.json, .csvAvailable download formats
Dataset updated
Nov 28, 2022
Dataset authored and provided by
Traject Data
Area covered
United States of America
Description
BigBox API provides reliable, real-time Home Depot product, category, reviews, and offers data. All data includes comprehensive coverage of each of the search results in a cleanly structured output.

You can originate your request from any zip code (US) to see results as they would appear to customers in the specified location i.e. shipping info. BigBox APIs high-capacity, global infrastructure assures you the highest level of performance and reliability. For easy integration with your Home Depot data apps and services, data is delivered in JSON or CSV format.

Data is retrieved by search term, search results page URL, or for single products, by the Home Depot item ID or by global identifiers such as GTIN, ISBN, UPC and EAN. GTIN-based requests work by looking up the GTIN/ISBN/UPC on Home Depot first, then retrieving the product details for the first matching item ID.

So what's in the data from BigBox API?

Product: - Item & parent ID - UPC - Store SKU - In-store bay &/or aisle - Product specifications - Description - Imagery - Product videos - Buy Box winner: price and fulfillment info - Rating & reviews count - Descriptive attributes

Search results: - Product details per search result: - Position - Related queries - Pagination - Facets

How can BigBox API be used? - Product listing management - Price monitoring - Category & product trends monitoring - Market research & competitor intelligence - Location-specific shipping data - Rank tracking on Home Depot

...and more, depending on your request parameters or the search result.

Who uses BigBox API? This data is leveraged by software developers, marketers & business owners, sales & business development teams, researchers, and data analysts & engineers, in ecommerce, other retail business, agencies and SaaS platforms.

Anyone in your organization who works with your digital presence can develop business intelligence and strategy using this advanced product data.
Z
SciGRID_gas LKD_Raw
data.niaid.nih.gov
Updated Jul 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Diettrich,Jan; Pluta, Adam; Medjroubi, Wided (2024). SciGRID_gas LKD_Raw [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3980984
Explore at:
Dataset updated
Jul 19, 2024
Dataset provided by
DLR-VE
Authors
Diettrich,Jan; Pluta, Adam; Medjroubi, Wided
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The LKD_raw dataset is an outcome of the SciGRID_gas project

The data set contains geographical and meta information on the European gas transport network. The data originats from gas data of LKD_EU (ISBN: 978-3-86780-554-4) project. The original data repository can be found under 10.5281/zenodo.1044462.

The original data has been partially cleaned up and converted to fit to the SciGRID_gas project data structure.

The data is being stored in both CSV and GeoJSON files.
Books Set in Bath
kaggle.com
zip
Updated Dec 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Books Set in Bath [Dataset]. https://www.kaggle.com/datasets/thedevastator/books-set-in-bath/code
Explore at:
zip(7510 bytes)Available download formats
Dataset updated
Dec 6, 2023
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Books Set in Bath

A catalog of books set in the city of Bath

By Leigh Dodds [source]

About this dataset

The dataset offers insights into various literary works that take place in Bath, providing an opportunity for readers and researchers to explore the rich connections between literature and this historical city. Whether you are interested in local stories or looking for inspiration for your next visit to Bath, this dataset serves as a useful resource.

Each entry includes detailed information such as the unique identifier assigned by LibraryThing (URI), which allows users to access further metadata and book covers using LibraryThing's APIs. Additionally, if available, ISBNs are provided for easy identification of specific editions or versions of each book.

With columns formatted consistently as uri,**uri,title,**title,isbn,**isbn,and author,**author,the dataset ensures clarity and enables efficient data analysis.

How to use the dataset

Dataset Overview

Columns

This dataset consists of eight columns that provide important details about each book:

uri: The unique identifier for each book in the LibraryThing database.

title: The title of the book.

isbn: The International Standard Book Number (ISBN) for the book if known.

author: The author of the book.

Getting Started

Before diving into analyzing or exploring this dataset, it's important to understand its structure and familiarize yourself with its columns and values.

To get started:

Load/import it into your preferred data analysis tool or programming language (e.g., Python pandas library).

Follow along with code examples provided below for common tasks using pandas library.

Example Code: Getting Basic Insights

import pandas as pd # Load CSV file into pandas DataFrame data = pd.read_csv('Library_Thing_Books_Set_in_Bath.csv') # Print basic insights about columns and values print(Number of rows:, data.shape[0]) print(Number of columns:, data.shape[1]) print( Column names:, list(data.columns)) print( Sample data:) print(data.head())

Exploring the Data

Once you have loaded the dataset into your preferred tool, you can begin exploring and analyzing its contents. Here are a few common tasks to get you started:

1. Checking Unique Book Count:

unique_books = data['title'].nunique() print(Number of unique books:, unique_books)

2. Finding Books by a Specific Author:

author_name = Jane Austen books_by_author = data[data['author'] == author

Research Ideas

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: Library_Thing_Books_Set_in_Bath.csv | Column name | Description | |:--------------|:-----------------------------------------------------------------------------------------------------------------------| | uri | The unique identifier for each book in the dataset. (String) | | title | The title of the book. (String) | | isbn | The International Standard Book Number (ISBN) for the book, which is a unique identifier for published books. (String) | | author | The author of the book. (String) |

Acknowledgements

If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Leigh Dodds.
BlogCatalog dataset
figshare.com
zip
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nitin Agarwal; Xufei Wang (2023). BlogCatalog dataset [Dataset]. http://doi.org/10.6084/m9.figshare.11923611.v3
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.11923611.v3
Dataset updated
May 30, 2023
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Nitin Agarwal; Xufei Wang
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Abstract: BlogCatalog is the social blog directory which manages the bloggers and their blogs.Number of Nodes:10,312Number of Edges:333,983Missing Values?noSource:Nitin Agarwal+, Xufei Wang*, Huan Liu*+ Department of Information Science, University of Arkansas at Little Rock. E-mail:nxagarwal@ualr.edu* School of Computing, Informatics and Decision Systems Engineering, Arizona State University. E-mail: huan.liu@asu.edu, xufei.wang@asu.eduData Set Information:2 files are included:1. nodes.csv-- it's the file of all the users. This file works as a dictionary of all the users in this data set. It's useful for fast reference. It contains all the node ids used in the dataset.2. edges.csv-- this is the friendship network among the bloggers. The blogger's friends are represented using edges. Here is an example.1,2This means blogger with id "1" is friend with blogger id "2".Attribute Information:This is the data set crawled on July, 2009 from BlogCatalog ( http://www.blogcatalog.com ). BlogCatalog is a social blog directory website. This contains the friendship network crawled. For easier understanding, all the contents are organized in CSV file format.-. Basic statisticsNumber of bloggers : 88,784Number of friendship pairs: 4,186,390Relevant Papers:Nitin Agarwal and Huan Liu. ”Modeling and Data Mining in Blogosphere”, Synthesis Lectures on Data Mining and Knowledge Discovery #1, Morgan & Claypool Publishers, Robert Grossman (Editor), August 2009. ISBN: 9781598299083 (paperback) ISBN: 9781598299090 (ebook) Nitin Agarwal, Magdiel Galan, Huan Liu, and Shankar Subramanya. WisColl: Collective Wisdom based Blog Clustering. Journal of Information Science, 180(1): 39-61, January, 2010. Nitin Agarwal, Huan Liu, Sudheendra Murthy, Arunabha Sen, and Xufei Wang. A Social Identity Approach to Identify Familiar Strangers in a Social Network. In Proceedings of the Third International AAAI Conference on Weblogs and Social Media (ICWSM09), pp. 2 - 9, May 17-20, 2009. San Jose, California. Nitin Agarwal, Huan Liu, Sudheendra Murthy, Arunabha Sen, and Xufei Wang. "A Social Identity Approach to Identify Familiar Strangers in a Social Network", 3rd International AAAI Conference on Weblogs and Social Media (ICWSM09), pp. 2 - 9, May 17-20, 2009. San Jose, California.
Goodreads's Books
kaggle.com
zip
Updated Mar 23, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Justin Nguyen (2021). Goodreads's Books [Dataset]. https://www.kaggle.com/datasets/khanhdnguyen/goodreadss-books
Explore at:
zip(7682847 bytes)Available download formats
Dataset updated
Mar 23, 2021
Authors
Justin Nguyen
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

This dataset consists of over 20,000 books available on Goodreads. The dataset is collected by crawling information directly from official Goodreads website.
Since December 2020, Goodreads no longer issues new developer keys for public developer API. Therefore, goodreads crawler project has been created to retrieve raw data from the website.

Note - Raw data alert: - Duplicates - Missing values - Invalid values - Multi-values columns - Datetime formats

Content

Features: - bookID: book's ID - title: book's title - authors: list of authors - description: summary description - num_ratings: total number of ratings - num_reviews: total number of reviews - avg_rating: average rating - language: languages - publish date: current book's published date - first_publish_date: The published date of first version - series: book's series - characters: list of characters - places: book's places - awards: List of winning awards - genres: list of book's genres - isbn: International Standard Book Number - isbn13: International Standard Book Number (13 digits) - rated 5, 4, 3, 2, 1: Number of rated reviews

Inspiration

Data wrangling

Data visualisation

Book classification

Book recommendation

Predict book popularity/ratings

Please upvote if you found this dataset is useful
Goodreads Book Reviews
kaggle.com
zip
Updated Oct 30, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ahmad (2023). Goodreads Book Reviews [Dataset]. https://www.kaggle.com/datasets/pypiahmad/goodreads-book-reviews1
Explore at:
zip(8738754435 bytes)Available download formats
Dataset updated
Oct 30, 2023
Authors
Ahmad
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
The Goodreads Book Reviews dataset encapsulates a wealth of reviews and various attributes concerning the books listed on the Goodreads platform. A distinguishing feature of this dataset is its capture of multiple tiers of user interaction, ranging from adding a book to a "shelf", to rating and reading it. This dataset is a treasure trove for those interested in understanding user behavior, book recommendations, sentiment analysis, and the interplay between various attributes of books and user interactions.

Basic Statistics: - Items: 1,561,465 - Users: 808,749 - Interactions: 225,394,930

Metadata: - Reviews: The text of the reviews provided by users. - Add-to-shelf, Read, Review Actions: Various interactions users have with the books. - Book Attributes: Attributes describing the books including title, and ISBN. - Graph of Similar Books: A graph depicting similarity relations between books.

Example (interaction data): json { "user_id": "8842281e1d1347389f2ab93d60773d4d", "book_id": "130580", "review_id": "330f9c153c8d3347eb914c06b89c94da", "isRead": true, "rating": 4, "date_added": "Mon Aug 01 13:41:57 -0700 2011", "date_updated": "Mon Aug 01 13:42:41 -0700 2011", "read_at": "Fri Jan 01 00:00:00 -0800 1988", "started_at": "" }

Use Cases: - Book Recommendations: Creating personalized book recommendations based on user interactions and preferences. - Sentiment Analysis: Analyzing sentiment in reviews and understanding how different book attributes influence sentiment. - User Behavior Analysis: Understanding user interaction patterns with books and deriving insights to enhance user engagement. - Natural Language Processing: Training models to process and analyze user-generated text in reviews. - Similarity Analysis: Analyzing the graph of similar books to understand book similarities and clustering.

Citation: Please cite the following if you use the data: Item recommendation on monotonic behavior chains Mengting Wan, Julian McAuley RecSys, 2018 [PDF](https://cseweb.ucsd.edu/~jmcauley/pdfs/recsys18e.pdf)

Code Samples: A curated set of code samples is provided in the dataset's Github repository, aiding in seamless interaction with the datasets. These include: - Downloading datasets without GUI: Facilitating dataset download in a non-GUI environment. - Displaying Sample Records: Showcasing sample records to get a glimpse of the dataset structure. - Calculating Basic Statistics: Computing basic statistics to understand the dataset's distribution and characteristics. - Exploring the Interaction Data: Delving into interaction data to grasp user-book interaction patterns. - Exploring the Review Data: Analyzing review data to extract valuable insights from user reviews.

Additional Dataset: - Complete book reviews (~15m multilingual reviews about ~2m books and 465k users): This dataset comprises a comprehensive collection of reviews, showcasing a multilingual facet with reviews about around 2 million books from 465,000 users.

Datasets:

Meta-Data of Books:

Detailed Book Graph (goodreads_books.json.gz): A comprehensive graph detailing around 2.3 million books, acting as a rich source of book attributes and metadata.

Download Link

Detailed Information of Authors (goodreads_book_authors.json.gz):

An extensive dataset containing detailed information about book authors, essential for understanding author-centric trends and insights.

Download Link

Detailed Information of Works (goodreads_book_works.json.gz):

This dataset provides abstract information about a book disregarding any particular editions, facilitating a high-level understanding of each work.

Download Link

Detailed Information of Book Series (goodreads_book_series.json.gz):

A dataset encompassing detailed information about book series, aiding in understanding series-related trends and insights. Note that the series id included here cannot be used for URL hack.

Download Link

Extracted Fuzzy Book Genres (goodreads_book_genres_initial.json....
Books_Dataset_GoodReads(May 2024)
kaggle.com
zip
Updated May 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Grimm (2024). Books_Dataset_GoodReads(May 2024) [Dataset]. https://www.kaggle.com/datasets/dk123891/books-dataset-goodreadsmay-2024/versions/1
Explore at:
zip(72249894 bytes)Available download formats
Dataset updated
May 17, 2024
Authors
Grimm
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Description: This dataset provides a comprehensive collection of data from GoodReads, a popular platform for readers to discover and review books. It includes information on books, book details, and user reviews, offering insights into the literary world.

Content: - Books: The dataset contains information such as book titles, authors, genres, publication dates, and ISBN numbers. Additionally, it includes details about book collections, including the total number of books in each collection and the total votes received for the collection. - Book Details: Supplementary details about the books, such as summaries, cover images, formats, publication info, and the number of pages. - User Reviews: User-generated reviews and ratings for the books, along with information about the reviewers, such as their usernames, follower counts, total reviews written, review dates, and review ratings.

Size: The dataset is approximately 205MB in size.

Format: The dataset files are stored in SQLite(db) and csv format.

Date Range: The dataset covers the period of May 2024.

Source: The data was collected from GoodReads, a leading platform for book lovers to explore, review, and discuss literature.

Purpose: The dataset serves various purposes, including analysis, research, and application in Natural Language Processing (NLP), Deep Learning (DL), and image generation tasks. It can be used to analyze trends in book genres, study user preferences, build recommendation systems, and perform sentiment analysis on user reviews.

Usage: Researchers, data scientists, and enthusiasts can leverage this dataset to gain insights into the literary landscape, understand readers' preferences, and develop novel applications in the field of literature analysis and recommendation systems. Potential use cases include analyzing book trends over time, predicting user preferences based on reviews, and generating book cover images using Deep Learning techniques.

License: This dataset is released under the MIT License, allowing users to freely use, modify, and distribute the data for both commercial and non-commercial purposes.

Example Entries: - Books: - Book Title: "To Kill a Mockingbird" Author: Harper Lee Genre: Fiction Publication Date: July 11, 1960 Collection Title: Classic Novels Total Books in Collection: 100 Total Votes for Collection: 5000

Book Title: "1984" Author: George Orwell Genre: Dystopian Fiction Publication Date: June 8, 1949 Collection Title: Modern Classics Total Books in Collection: 75 Total Votes for Collection: 3500

Book Details:

Book ID: 123456 Title: "The Great Gatsby" Cover Image URI: https://example.com/great-gatsby-cover.jpg Format: Paperback Publication Info: Scribner; Reprint edition (May 6, 2003) Authorlink: https://www.goodreads.com/author/show/3190.F_Scott_Fitzgerald Author: F. Scott Fitzgerald Num Pages: 180

User Reviews:

Reviewer ID: user123 Reviewer Name: John Doe Likes on Review: 25 Review Content: "This book was fantastic! Couldn't put it down." Reviewer Followers: 500 Reviewer Total Reviews: 50 Review Date: May 15, 2024 Review Rating: 5/5
Tabela de livros
kaggle.com
Updated Oct 18, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Diego Mariano (2022). Tabela de livros [Dataset]. http://doi.org/10.34740/kaggle/dsv/4348770
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/4348770
Dataset updated
Oct 18, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Diego Mariano
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Lista de livros em formato JSON e CSV com com título, autor, ISBN, páginas e ano. Ao todo, a lista contém mais de 10 mil livros publicados até o ano de 2021. Base produzida para ser usada como exemplo para criação de APIs web service.

Fonte: dados de domínio público. Dados foram coletados da base https://www.kaggle.com/datasets/victorstein/livros-skoob criada por VICTOR STEIN (2019) e da plataforma SKOOB https://www.skoob.com.br/.

Nota: como tive que processar a tabela para usar como exemplo, achei que valia a pena compartilhar os arquivos que gerei, mantendo uma referência ao autor original.
Footstep Power Generation Tile
kaggle.com
zip
Updated Jul 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Atharva Arya (2023). Footstep Power Generation Tile [Dataset]. https://www.kaggle.com/datasets/atharvaarya25/footstep-power-generation
Explore at:
zip(1208 bytes)Available download formats
Dataset updated
Jul 21, 2023
Authors
Atharva Arya
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Description
Competition: DJ STRIKE 2023

Awarded: 3rd Place

Published: Research Paper (ISBN: 978- 93-5768-409-5)

Proceedings: AVISHKAR 2023

Other participations: Schneider Electric GO GREEN Competition

Project Summary

Developed a piezoelectric tile generating an average of 1mW per step. Data extracted from sensors(Voltage sensor, Current sensor) was sent to Firebase with respect to Exploratory Data Analysis (EDA) for pattern identification. The power stored in the battery could charge a phone for 10 minutes.

About Dataset

The dataset was manually created over a period of 3 weeks with prototype and product testing. The dataset was made in college with the help of different students from various branches.

It contains 3 numerical features voltage (V), current (mA), and weight (kgs). 1 categorical feature that is the step location of the person on the square tile - center, edge, or corner.

It also contains null values so be sure to find and drop them with your data analysis skills. Null values occurred due to loose connection of sensors or internal tile connection.
Activity Recognition
kaggle.com
zip
Updated Apr 15, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alex Kudin (2019). Activity Recognition [Dataset]. https://www.kaggle.com/avk256/activity-recognition
Explore at:
zip(23170068 bytes)Available download formats
Dataset updated
Apr 15, 2019
Authors
Alex Kudin
Description
Source:

Uncalibrated Accelerometer Data are collected from 15 participantes performing 7 activities. The dataset provides challenges for identification and authentication of people using motion patterns.

Data Set Information:

--- The dataset collects data from a wearable accelerometer mounted on the chest --- Sampling frequency of the accelerometer: 52 Hz --- Accelerometer Data are Uncalibrated --- Number of Participants: 15 --- Number of Activities: 7 --- Data Format: CSV

Attribute Information:

--- Data are separated by participant --- Each file contains the following information ---- sequential number, x acceleration, y acceleration, z acceleration, label --- Labels are codified by numbers --- 1: Working at Computer --- 2: Standing Up, Walking and Going updown stairs --- 3: Standing --- 4: Walking --- 5: Going UpDown Stairs --- 6: Walking and Talking with Someone --- 7: Talking while Standing

Relevant Papers:

--- Casale, P. Pujol, O. and Radeva, P. 'BeaStreamer-v0.1: a new platform for Multi-Sensors Data Acquisition in Wearable Computing Applications', CVCRD09, ISBN: 978-84-937261-1-9, 2009 available on [Web Link]

--- Casale, P. Pujol, O. and Radeva, P. 'Human activity recognition from accelerometer data using a wearable device', IbPRIA'11, 289-296, Springer-Verlag, 2011 available on [Web Link]

--- Casale, P. Pujol, O. and Radeva, P. 'Personalization and user verification in wearable systems using biometric walking patterns' Personal and Ubiquitous Computing, 16(5), 563-580, 2012 available on [Web Link]

Citation Request:

Casale, P. Pujol, O. and Radeva, P. 'Personalization and user verification in wearable systems using biometric walking patterns' Personal and Ubiquitous Computing, 16(5), 563-580, 2012
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Goulven Furet (2025). Books dataset, ISBN based [Dataset]. https://www.kaggle.com/datasets/goulvenfuret/books-dataset-isbn-based

Books dataset, ISBN based

A large database of books with titles and attributes, based on the ISBN

Explore at:

zip(367961043 bytes)Available download formats

Dataset updated

Oct 13, 2025

Authors

Goulven Furet

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Nudger is a responsible price comparison tool registered as a public good. This project follows an open-source and open-data approach, featuring open datasets on books and products that are accessible to everyone.

The data shared by Nudger primarily covers the French market.

ISBN Dataset: Contains information on over 6 million books identified by their ISBN numbers.

Nudger is an open and growing project. Feel free to contact us with any questions!

Clear search

Close search

Google apps

Main menu

Books dataset, ISBN based

Best Books Ever Dataset

Books Dataset

Amazon Books details for computer science

Crossref metadata of COCI bibliographic resources, as of November 2018 and...

Book covers dataset

Context

Content

main_dataset.csv

book-covers

Acknowledgements

Open e-commerce 1.0: Five years of crowdsourced U.S. Amazon purchase...

BigBox API | Home Depot Product & Search Results Data

SciGRID_gas LKD_Raw

Books Set in Bath

Books Set in Bath

A catalog of books set in the city of Bath

About this dataset

How to use the dataset

Dataset Overview

Columns

Getting Started

Example Code: Getting Basic Insights

Exploring the Data

1. Checking Unique Book Count:

2. Finding Books by a Specific Author:

Research Ideas

Acknowledgements

License

Columns

Acknowledgements

BlogCatalog dataset

Goodreads's Books

Context

Content

Inspiration

Goodreads Book Reviews

Meta-Data of Books:

Books_Dataset_GoodReads(May 2024)

Tabela de livros

Footstep Power Generation Tile

Project Summary

About Dataset

Activity Recognition

Books dataset, ISBN based

A large database of books with titles and attributes, based on the ISBN