17 datasets found
  1. Books dataset, ISBN based

    • kaggle.com
    zip
    Updated Oct 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Goulven Furet (2025). Books dataset, ISBN based [Dataset]. https://www.kaggle.com/datasets/goulvenfuret/books-dataset-isbn-based
    Explore at:
    zip(367961043 bytes)Available download formats
    Dataset updated
    Oct 13, 2025
    Authors
    Goulven Furet
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Nudger is a responsible price comparison tool registered as a public good. This project follows an open-source and open-data approach, featuring open datasets on books and products that are accessible to everyone.

    The data shared by Nudger primarily covers the French market.

    ISBN Dataset: Contains information on over 6 million books identified by their ISBN numbers.

    Nudger is an open and growing project. Feel free to contact us with any questions!

  2. Best Books Ever Dataset

    • zenodo.org
    csv
    Updated Nov 10, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lorena Casanova Lozano; Sergio Costa Planells; Lorena Casanova Lozano; Sergio Costa Planells (2020). Best Books Ever Dataset [Dataset]. http://doi.org/10.5281/zenodo.4265096
    Explore at:
    csvAvailable download formats
    Dataset updated
    Nov 10, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Lorena Casanova Lozano; Sergio Costa Planells; Lorena Casanova Lozano; Sergio Costa Planells
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    The dataset has been collected in the frame of the Prac1 of the subject Tipology and Data Life Cycle of the Master's Degree in Data Science of the Universitat Oberta de Catalunya (UOC).

    The dataset contains 25 variables and 52478 records corresponding to books on the GoodReads Best Books Ever list (the larges list on the site).

    Original code used to retrieve the dataset can be found on github repository: github.com/scostap/goodreads_bbe_dataset

    The data was retrieved in two sets, the first 30000 books and then the remainig 22478. Dates were not parsed and reformated on the second chunk so publishDate and firstPublishDate are representet in a mm/dd/yyyy format for the first 30000 records and Month Day Year for the rest.

    Book cover images can be optionally downloaded from the url in the 'coverImg' field. Python code for doing so and an example can be found on the github repo.

    The 25 fields of the dataset are:

    | Attributes | Definition | Completeness |
    | ------------- | ------------- | ------------- | 
    | bookId | Book Identifier as in goodreads.com | 100 |
    | title | Book title | 100 |
    | series | Series Name | 45 |
    | author | Book's Author | 100 |
    | rating | Global goodreads rating | 100 |
    | description | Book's description | 97 |
    | language | Book's language | 93 |
    | isbn | Book's ISBN | 92 |
    | genres | Book's genres | 91 |
    | characters | Main characters | 26 |
    | bookFormat | Type of binding | 97 |
    | edition | Type of edition (ex. Anniversary Edition) | 9 |
    | pages | Number of pages | 96 |
    | publisher | Editorial | 93 |
    | publishDate | publication date | 98 |
    | firstPublishDate | Publication date of first edition | 59 |
    | awards | List of awards | 20 |
    | numRatings | Number of total ratings | 100 |
    | ratingsByStars | Number of ratings by stars | 97 |
    | likedPercent | Derived field, percent of ratings over 2 starts (as in GoodReads) | 99 |
    | setting | Story setting | 22 |
    | coverImg | URL to cover image | 99 |
    | bbeScore | Score in Best Books Ever list | 100 |
    | bbeVotes | Number of votes in Best Books Ever list | 100 |
    | price | Book's price (extracted from Iberlibro) | 73 |

  3. Books Dataset

    • kaggle.com
    zip
    Updated Dec 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Elvin Rustamov (2023). Books Dataset [Dataset]. https://www.kaggle.com/datasets/elvinrustam/books-dataset
    Explore at:
    zip(55469565 bytes)Available download formats
    Dataset updated
    Dec 20, 2023
    Authors
    Elvin Rustamov
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Overview: This dataset comprises information scraped from wonderbk.com, a popular online bookstore. The dataset contains details of 103,063 books, with key attributes such as title, authors, description, category, publisher, starting price, and publish date.

    Columns:

    • Title: The title of the book.
    • Authors: The authors of the book.
    • Description: A brief description of the book.
    • Category: The category or genre to which the book belongs.
    • Publisher: The publishing house responsible for the book.
    • Price Starting With ($): The initial price of the book.
    • Publish Date (Month): The month in which the book was published.
    • Publish Date (Year): The year of publication.
  4. Amazon Books details for computer science

    • kaggle.com
    zip
    Updated Sep 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muhammad Uzair Khan (2023). Amazon Books details for computer science [Dataset]. https://www.kaggle.com/datasets/uzair01/amazon-books
    Explore at:
    zip(30197 bytes)Available download formats
    Dataset updated
    Sep 26, 2023
    Authors
    Muhammad Uzair Khan
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Context: The "Amazon Books Dataset" is a meticulously curated collection of data related to books available on the Amazon platform, with a primary focus on computer science literature. This dataset holds significant importance for researchers, industry experts, and enthusiasts who are interested in gaining profound insights into the world of computer science literature, publishing trends, and the broader book market. Computer science is at the forefront of technological innovation and is a driving force in shaping the modern world. Understanding the dynamics of the Amazon book market, particularly in the realm of computer science, is essential for various stakeholders in the publishing, technology, and retail industries.

    Data Source: The data for this dataset has been meticulously sourced and compiled from Amazon's extensive catalog of books, with a specific emphasis on books related to computer science. The information is current as of the dataset's creation date and may continue to evolve as new books are published and existing ones undergo changes in attributes such as ratings and prices.

    Content: The "Amazon Books Catalog Dataset" provides a rich and comprehensive array of data points for each book listed on Amazon, with a primary focus on computer science literature. These data points empower users to conduct in-depth analyses and uncover valuable insights. Here are the key columns included in this dataset:

    1. Title: The title of the book, represents its core identity.
    2. Description: A concise yet informative overview of the book's content and themes.
    3. Author: The name(s) of the author(s) responsible for creating the literary work.
    4. ISBN-10: The International Standard Book Number in the 10-digit format, aiding in precise identification.
    5. ISBN-13: The International Standard Book Number in the 13-digit format, enhancing global recognition.
    6. Publish Date: The date on which the book was officially published, marking its entry into the literary world.
    7. Edition: Information about the edition of the book, if applicable, providing insights into its various releases.
    8. Best Seller: A binary indicator (1 or 0) signifying whether the book has achieved best-seller status.
    9. Top Rated: A binary indicator (1 or 0) highlighting whether the book has received high ratings and acclaim from readers.
    10. Rating: The average rating awarded to the book by Amazon customers, offering an assessment of its quality.
    11. Review Count: The total number of user reviews posted for the book, reflecting reader engagement and feedback.
    12. Price: The current price of the book on the Amazon platform, influences purchase decisions.

    Potential Use Cases: This dataset, with its focus on computer science literature, presents a unique opportunity for research and analysis in fields such as:

    • Technology Trends: Analyzing emerging trends in computer science literature, including topics, technologies, and subfields.
    • Educational Insights: Investigating the popularity of computer science textbooks and learning materials.
    • Author Impact: Assessing the influence and contributions of authors within the computer science domain.
    • Market Analysis: Understanding pricing dynamics and factors influencing the success of computer science books.

    Researchers, analysts, and computer science enthusiasts can leverage this dataset to explore the intricate world of computer science literature, uncover market trends, and contribute to a deeper understanding of the evolving landscape of technology and knowledge dissemination on one of the world's largest online marketplaces.

  5. Z

    Crossref metadata of COCI bibliographic resources, as of November 2018 and...

    • data-staging.niaid.nih.gov
    • data.niaid.nih.gov
    • +1more
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhu, Yongjun; Yan, Erjia; Peroni, Silvio; Che, Chao (2020). Crossref metadata of COCI bibliographic resources, as of November 2018 and LCC categories of the ISBN entities in the dataset [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_3241744
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    College of Computing and Informatics, Drexel University, 3141 Chestnut Street, Philadelphia, PA 19104, USA
    Digital Humanities Advanced Research Centre (DHARC), Department of Classical Philology and Italian Studies, University of Bologna, Via Zamboni 32, 40126 Bologna, Italy
    Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, Dalian University, Dalian, China
    Department of Library and Information Science, Sungkyunkwan University, 25-2, Sungkyunkwan-ro, Jongno-gu, Seoul, Republic of Korea
    Authors
    Zhu, Yongjun; Yan, Erjia; Peroni, Silvio; Che, Chao
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The all.zip CSV file (zipped) contains citation counts obtained from the November 2018 dump of COCI (https://doi.org/10.6084/m9.figshare.6741422.v3) and some metadata (title, DOI, number of authors, ISBN, ISBN of the container, type of the bibliographic resource) of the related citing and cited entities obtained by using the Crossref dump downloaded in October 2018 – which is the same dump used to create the COCI data.

    In addition, it contains all the Library of Congress Classification (LCC) categories associated with each ISBN in the previous dataset (file isbn_cat_lcc.csv), according to the data retrieved using the services at http://classify.oclc.org/classify2/api_docs/index.html. Two ancillary mapping files have been also added: one (ddc_to_lcc_mapping.csv) for converting a Dewey Decimal Classification (DDC) categories into LCC categories, in the case the service mentioned above returned only DDC categories for some ISBN; the other (lcc_to_wos_mapping.csv) to map each LCC category into the related Web of Science research area.

  6. Book covers dataset

    • kaggle.com
    zip
    Updated Jan 3, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Luka Anicin (2020). Book covers dataset [Dataset]. https://www.kaggle.com/datasets/lukaanicin/book-covers-dataset/discussion
    Explore at:
    zip(288287453 bytes)Available download formats
    Dataset updated
    Jan 3, 2020
    Authors
    Luka Anicin
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    A few months ago, I started working on a side project that allows users to search for books by taking a picture of a book cover. The main barrier at the beginning was data quality of book cover datasets available online, so I created this one.

    For example, this dataset can be used for building recommendation and Content Based Image Retrieval (CBIR) systems.

    Content

    main_dataset.csv

    This CSV file contains all meta information for each book in the dataset.

    • image - URLs of book covers. Use this cover to download images yourselves if you need.
    • name - Title of a book.
    • author - Author of a book.
    • format - Physical format of a book (i.e. paperback)
    • book_depository_stars - Book's rating found on the bookdepository.com(NOTE: Due to difference between scraping and download time of the dataset, this information might be different from one on the website)
    • price - Book's current price found on the bookdepository.com (NOTE: Due to difference between scraping and download time of the dataset, this information might be different from one on the website)
    • currency - Currency of prices found in the dataset.
    • old_price -Book's old price (if exists) found on the bookdepository.com (NOTE: Due to difference between scraping and download time of the dataset, this information might be different from one on the website)
    • isbn -ISBN number of a book.
    • category -Category of a book found on the bookdepository.com
    • img_paths -Book's cover local path (after scraping).

    book-covers

    In this folder you can find all book covers, sorted in category based folders, in the .jpg format. This dataset is contains 33 classes (book categories) and each contains close to 1k images, so it is pretty balanced. NOTE: Extract this data into folder called dataset so it matches paths provided for you in the main_dataset.csv file.

    Acknowledgements

    All data found in this dataset was scraped from the https://www.bookdepository.com/. (Not related to them in any way, just a great website 👍 )-

  7. d

    Open e-commerce 1.0: Five years of crowdsourced U.S. Amazon purchase...

    • search.dataone.org
    Updated Dec 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alex Berke; Dan Calacci; Robert Mahari; Takahiro Yabe; Kent Larson; Sandy Pentland (2023). Open e-commerce 1.0: Five years of crowdsourced U.S. Amazon purchase histories with user demographics [Dataset]. http://doi.org/10.7910/DVN/YGLYDY
    Explore at:
    Dataset updated
    Dec 16, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Alex Berke; Dan Calacci; Robert Mahari; Takahiro Yabe; Kent Larson; Sandy Pentland
    Description

    This dataset contains longitudinal purchases data from 5027 Amazon.com users in the US, spanning 2018 through 2022: amazon-purchases.csv It also includes demographic data and other consumer level variables for each user with data in the dataset. These consumer level variables were collected through an online survey and are included in survey.csv fields.csv describes the columns in the survey.csv file, where fields/survey columns correspond to survey questions. The dataset also contains the survey instrument used to collect the data. More details about the survey questions and possible responses, and the format in which they were presented can be found by viewing the survey instrument. A 'Survey ResponseID' column is present in both the amazon-purchases.csv and survey.csv files. It links a user's survey responses to their Amazon.com purchases. The 'Survey ResponseID' was randomly generated at the time of data collection. amazon-purchases.csv Each row in this file corresponds to an Amazon order. Each such row has the following columns: Survey ResponseID Order date Shipping address state Purchase price per unit Quantity ASIN/ISBN (Product Code) Title Category The data were exported by the Amazon users from Amazon.com and shared by users with their informed consent. PII and other information not listed above were stripped from the data. This processing occurred on users' machines before sharing with researchers.

  8. d

    BigBox API | Home Depot Product & Search Results Data

    • datarade.ai
    .json, .csv
    Updated Nov 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Traject Data (2022). BigBox API | Home Depot Product & Search Results Data [Dataset]. https://datarade.ai/data-products/bigbox-api-home-depot-product-search-results-data-traject-data
    Explore at:
    .json, .csvAvailable download formats
    Dataset updated
    Nov 28, 2022
    Dataset authored and provided by
    Traject Data
    Area covered
    United States of America
    Description

    BigBox API provides reliable, real-time Home Depot product, category, reviews, and offers data. All data includes comprehensive coverage of each of the search results in a cleanly structured output.

    You can originate your request from any zip code (US) to see results as they would appear to customers in the specified location i.e. shipping info. BigBox APIs high-capacity, global infrastructure assures you the highest level of performance and reliability. For easy integration with your Home Depot data apps and services, data is delivered in JSON or CSV format.

    Data is retrieved by search term, search results page URL, or for single products, by the Home Depot item ID or by global identifiers such as GTIN, ISBN, UPC and EAN. GTIN-based requests work by looking up the GTIN/ISBN/UPC on Home Depot first, then retrieving the product details for the first matching item ID.

    So what's in the data from BigBox API?

    Product: - Item & parent ID - UPC - Store SKU - In-store bay &/or aisle - Product specifications - Description - Imagery - Product videos - Buy Box winner: price and fulfillment info - Rating & reviews count - Descriptive attributes

    Search results: - Product details per search result: - Position - Related queries - Pagination - Facets

    How can BigBox API be used? - Product listing management - Price monitoring - Category & product trends monitoring - Market research & competitor intelligence - Location-specific shipping data - Rank tracking on Home Depot

    ...and more, depending on your request parameters or the search result.

    Who uses BigBox API? This data is leveraged by software developers, marketers & business owners, sales & business development teams, researchers, and data analysts & engineers, in ecommerce, other retail business, agencies and SaaS platforms.

    Anyone in your organization who works with your digital presence can develop business intelligence and strategy using this advanced product data.

  9. Z

    SciGRID_gas LKD_Raw

    • data.niaid.nih.gov
    Updated Jul 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Diettrich,Jan; Pluta, Adam; Medjroubi, Wided (2024). SciGRID_gas LKD_Raw [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3980984
    Explore at:
    Dataset updated
    Jul 19, 2024
    Dataset provided by
    DLR-VE
    Authors
    Diettrich,Jan; Pluta, Adam; Medjroubi, Wided
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The LKD_raw dataset is an outcome of the SciGRID_gas project

    The data set contains geographical and meta information on the European gas transport network. The data originats from gas data of LKD_EU (ISBN: 978-3-86780-554-4) project. The original data repository can be found under 10.5281/zenodo.1044462.

    The original data has been partially cleaned up and converted to fit to the SciGRID_gas project data structure.

    The data is being stored in both CSV and GeoJSON files.

  10. Books Set in Bath

    • kaggle.com
    zip
    Updated Dec 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Books Set in Bath [Dataset]. https://www.kaggle.com/datasets/thedevastator/books-set-in-bath/code
    Explore at:
    zip(7510 bytes)Available download formats
    Dataset updated
    Dec 6, 2023
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Books Set in Bath

    A catalog of books set in the city of Bath

    By Leigh Dodds [source]

    About this dataset

    The dataset offers insights into various literary works that take place in Bath, providing an opportunity for readers and researchers to explore the rich connections between literature and this historical city. Whether you are interested in local stories or looking for inspiration for your next visit to Bath, this dataset serves as a useful resource.

    Each entry includes detailed information such as the unique identifier assigned by LibraryThing (URI), which allows users to access further metadata and book covers using LibraryThing's APIs. Additionally, if available, ISBNs are provided for easy identification of specific editions or versions of each book.

    With columns formatted consistently as uri,**uri,title,**title,isbn,**isbn,and author,**author,the dataset ensures clarity and enables efficient data analysis.

    How to use the dataset

    Dataset Overview

    Columns

    This dataset consists of eight columns that provide important details about each book:

    • uri: The unique identifier for each book in the LibraryThing database.
    • title: The title of the book.
    • isbn: The International Standard Book Number (ISBN) for the book if known.
    • author: The author of the book.

    Getting Started

    Before diving into analyzing or exploring this dataset, it's important to understand its structure and familiarize yourself with its columns and values.

    To get started:

    • Load/import it into your preferred data analysis tool or programming language (e.g., Python pandas library).
    • Follow along with code examples provided below for common tasks using pandas library.

    Example Code: Getting Basic Insights

    import pandas as pd
    
    # Load CSV file into pandas DataFrame
    data = pd.read_csv('Library_Thing_Books_Set_in_Bath.csv')
    
    # Print basic insights about columns and values
    print(Number of rows:, data.shape[0])
    print(Number of columns:, data.shape[1])
    print(
    Column names:, list(data.columns))
    print(
    Sample data:)
    print(data.head())
    

    Exploring the Data

    Once you have loaded the dataset into your preferred tool, you can begin exploring and analyzing its contents. Here are a few common tasks to get you started:

    1. Checking Unique Book Count:

    unique_books = data['title'].nunique()
    print(Number of unique books:, unique_books)
    

    2. Finding Books by a Specific Author:

    author_name = Jane Austen
    books_by_author = data[data['author'] == author
    

    Research Ideas

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: Library_Thing_Books_Set_in_Bath.csv | Column name | Description | |:--------------|:-----------------------------------------------------------------------------------------------------------------------| | uri | The unique identifier for each book in the dataset. (String) | | title | The title of the book. (String) | | isbn | The International Standard Book Number (ISBN) for the book, which is a unique identifier for published books. (String) | | author | The author of the book. (String) |

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Leigh Dodds.

  11. BlogCatalog dataset

    • figshare.com
    zip
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nitin Agarwal; Xufei Wang (2023). BlogCatalog dataset [Dataset]. http://doi.org/10.6084/m9.figshare.11923611.v3
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Nitin Agarwal; Xufei Wang
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Abstract: BlogCatalog is the social blog directory which manages the bloggers and their blogs.Number of Nodes:10,312Number of Edges:333,983Missing Values?noSource:Nitin Agarwal+, Xufei Wang*, Huan Liu*+ Department of Information Science, University of Arkansas at Little Rock. E-mail:nxagarwal@ualr.edu* School of Computing, Informatics and Decision Systems Engineering, Arizona State University. E-mail: huan.liu@asu.edu, xufei.wang@asu.eduData Set Information:2 files are included:1. nodes.csv-- it's the file of all the users. This file works as a dictionary of all the users in this data set. It's useful for fast reference. It contains all the node ids used in the dataset.2. edges.csv-- this is the friendship network among the bloggers. The blogger's friends are represented using edges. Here is an example.1,2This means blogger with id "1" is friend with blogger id "2".Attribute Information:This is the data set crawled on July, 2009 from BlogCatalog ( http://www.blogcatalog.com ). BlogCatalog is a social blog directory website. This contains the friendship network crawled. For easier understanding, all the contents are organized in CSV file format.-. Basic statisticsNumber of bloggers : 88,784Number of friendship pairs: 4,186,390Relevant Papers:Nitin Agarwal and Huan Liu. ”Modeling and Data Mining in Blogosphere”, Synthesis Lectures on Data Mining and Knowledge Discovery #1, Morgan & Claypool Publishers, Robert Grossman (Editor), August 2009. ISBN: 9781598299083 (paperback) ISBN: 9781598299090 (ebook) Nitin Agarwal, Magdiel Galan, Huan Liu, and Shankar Subramanya. WisColl: Collective Wisdom based Blog Clustering. Journal of Information Science, 180(1): 39-61, January, 2010. Nitin Agarwal, Huan Liu, Sudheendra Murthy, Arunabha Sen, and Xufei Wang. A Social Identity Approach to Identify Familiar Strangers in a Social Network. In Proceedings of the Third International AAAI Conference on Weblogs and Social Media (ICWSM09), pp. 2 - 9, May 17-20, 2009. San Jose, California. Nitin Agarwal, Huan Liu, Sudheendra Murthy, Arunabha Sen, and Xufei Wang. "A Social Identity Approach to Identify Familiar Strangers in a Social Network", 3rd International AAAI Conference on Weblogs and Social Media (ICWSM09), pp. 2 - 9, May 17-20, 2009. San Jose, California.

  12. Goodreads's Books

    • kaggle.com
    zip
    Updated Mar 23, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Justin Nguyen (2021). Goodreads's Books [Dataset]. https://www.kaggle.com/datasets/khanhdnguyen/goodreadss-books
    Explore at:
    zip(7682847 bytes)Available download formats
    Dataset updated
    Mar 23, 2021
    Authors
    Justin Nguyen
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    This dataset consists of over 20,000 books available on Goodreads. The dataset is collected by crawling information directly from official Goodreads website.
    Since December 2020, Goodreads no longer issues new developer keys for public developer API. Therefore, goodreads crawler project has been created to retrieve raw data from the website.

    Note - Raw data alert: - Duplicates - Missing values - Invalid values - Multi-values columns - Datetime formats

    Content

    Features: - bookID: book's ID - title: book's title - authors: list of authors - description: summary description - num_ratings: total number of ratings - num_reviews: total number of reviews - avg_rating: average rating - language: languages - publish date: current book's published date - first_publish_date: The published date of first version - series: book's series - characters: list of characters - places: book's places - awards: List of winning awards - genres: list of book's genres - isbn: International Standard Book Number - isbn13: International Standard Book Number (13 digits) - rated 5, 4, 3, 2, 1: Number of rated reviews

    Inspiration

    • Data wrangling
    • Data visualisation
    • Book classification
    • Book recommendation
    • Predict book popularity/ratings

    Please upvote if you found this dataset is useful

  13. Goodreads Book Reviews

    • kaggle.com
    zip
    Updated Oct 30, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ahmad (2023). Goodreads Book Reviews [Dataset]. https://www.kaggle.com/datasets/pypiahmad/goodreads-book-reviews1
    Explore at:
    zip(8738754435 bytes)Available download formats
    Dataset updated
    Oct 30, 2023
    Authors
    Ahmad
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    The Goodreads Book Reviews dataset encapsulates a wealth of reviews and various attributes concerning the books listed on the Goodreads platform. A distinguishing feature of this dataset is its capture of multiple tiers of user interaction, ranging from adding a book to a "shelf", to rating and reading it. This dataset is a treasure trove for those interested in understanding user behavior, book recommendations, sentiment analysis, and the interplay between various attributes of books and user interactions.

    Basic Statistics: - Items: 1,561,465 - Users: 808,749 - Interactions: 225,394,930

    Metadata: - Reviews: The text of the reviews provided by users. - Add-to-shelf, Read, Review Actions: Various interactions users have with the books. - Book Attributes: Attributes describing the books including title, and ISBN. - Graph of Similar Books: A graph depicting similarity relations between books.

    Example (interaction data): json { "user_id": "8842281e1d1347389f2ab93d60773d4d", "book_id": "130580", "review_id": "330f9c153c8d3347eb914c06b89c94da", "isRead": true, "rating": 4, "date_added": "Mon Aug 01 13:41:57 -0700 2011", "date_updated": "Mon Aug 01 13:42:41 -0700 2011", "read_at": "Fri Jan 01 00:00:00 -0800 1988", "started_at": "" }

    Use Cases: - Book Recommendations: Creating personalized book recommendations based on user interactions and preferences. - Sentiment Analysis: Analyzing sentiment in reviews and understanding how different book attributes influence sentiment. - User Behavior Analysis: Understanding user interaction patterns with books and deriving insights to enhance user engagement. - Natural Language Processing: Training models to process and analyze user-generated text in reviews. - Similarity Analysis: Analyzing the graph of similar books to understand book similarities and clustering.

    Citation: Please cite the following if you use the data: Item recommendation on monotonic behavior chains Mengting Wan, Julian McAuley RecSys, 2018 [PDF](https://cseweb.ucsd.edu/~jmcauley/pdfs/recsys18e.pdf)

    Code Samples: A curated set of code samples is provided in the dataset's Github repository, aiding in seamless interaction with the datasets. These include: - Downloading datasets without GUI: Facilitating dataset download in a non-GUI environment. - Displaying Sample Records: Showcasing sample records to get a glimpse of the dataset structure. - Calculating Basic Statistics: Computing basic statistics to understand the dataset's distribution and characteristics. - Exploring the Interaction Data: Delving into interaction data to grasp user-book interaction patterns. - Exploring the Review Data: Analyzing review data to extract valuable insights from user reviews.

    Additional Dataset: - Complete book reviews (~15m multilingual reviews about ~2m books and 465k users): This dataset comprises a comprehensive collection of reviews, showcasing a multilingual facet with reviews about around 2 million books from 465,000 users.

    Datasets:

    Meta-Data of Books:

    • Detailed Book Graph (goodreads_books.json.gz): A comprehensive graph detailing around 2.3 million books, acting as a rich source of book attributes and metadata.
    • Detailed Information of Authors (goodreads_book_authors.json.gz):
      • An extensive dataset containing detailed information about book authors, essential for understanding author-centric trends and insights.
      • Download Link
    • Detailed Information of Works (goodreads_book_works.json.gz):
      • This dataset provides abstract information about a book disregarding any particular editions, facilitating a high-level understanding of each work.
      • Download Link
    • Detailed Information of Book Series (goodreads_book_series.json.gz):
      • A dataset encompassing detailed information about book series, aiding in understanding series-related trends and insights. Note that the series id included here cannot be used for URL hack.
      • Download Link
    • Extracted Fuzzy Book Genres (goodreads_book_genres_initial.json....
  14. Books_Dataset_GoodReads(May 2024)

    • kaggle.com
    zip
    Updated May 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Grimm (2024). Books_Dataset_GoodReads(May 2024) [Dataset]. https://www.kaggle.com/datasets/dk123891/books-dataset-goodreadsmay-2024/versions/1
    Explore at:
    zip(72249894 bytes)Available download formats
    Dataset updated
    May 17, 2024
    Authors
    Grimm
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Description: This dataset provides a comprehensive collection of data from GoodReads, a popular platform for readers to discover and review books. It includes information on books, book details, and user reviews, offering insights into the literary world.

    Content: - Books: The dataset contains information such as book titles, authors, genres, publication dates, and ISBN numbers. Additionally, it includes details about book collections, including the total number of books in each collection and the total votes received for the collection. - Book Details: Supplementary details about the books, such as summaries, cover images, formats, publication info, and the number of pages. - User Reviews: User-generated reviews and ratings for the books, along with information about the reviewers, such as their usernames, follower counts, total reviews written, review dates, and review ratings.

    Size: The dataset is approximately 205MB in size.

    Format: The dataset files are stored in SQLite(db) and csv format.

    Date Range: The dataset covers the period of May 2024.

    Source: The data was collected from GoodReads, a leading platform for book lovers to explore, review, and discuss literature.

    Purpose: The dataset serves various purposes, including analysis, research, and application in Natural Language Processing (NLP), Deep Learning (DL), and image generation tasks. It can be used to analyze trends in book genres, study user preferences, build recommendation systems, and perform sentiment analysis on user reviews.

    Usage: Researchers, data scientists, and enthusiasts can leverage this dataset to gain insights into the literary landscape, understand readers' preferences, and develop novel applications in the field of literature analysis and recommendation systems. Potential use cases include analyzing book trends over time, predicting user preferences based on reviews, and generating book cover images using Deep Learning techniques.

    License: This dataset is released under the MIT License, allowing users to freely use, modify, and distribute the data for both commercial and non-commercial purposes.

    Example Entries: - Books: - Book Title: "To Kill a Mockingbird" Author: Harper Lee Genre: Fiction Publication Date: July 11, 1960 Collection Title: Classic Novels Total Books in Collection: 100 Total Votes for Collection: 5000

    • Book Title: "1984" Author: George Orwell Genre: Dystopian Fiction Publication Date: June 8, 1949 Collection Title: Modern Classics Total Books in Collection: 75 Total Votes for Collection: 3500

    • Book Details:

    • User Reviews:

      • Reviewer ID: user123 Reviewer Name: John Doe Likes on Review: 25 Review Content: "This book was fantastic! Couldn't put it down." Reviewer Followers: 500 Reviewer Total Reviews: 50 Review Date: May 15, 2024 Review Rating: 5/5
  15. Tabela de livros

    • kaggle.com
    Updated Oct 18, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Diego Mariano (2022). Tabela de livros [Dataset]. http://doi.org/10.34740/kaggle/dsv/4348770
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 18, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Diego Mariano
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Lista de livros em formato JSON e CSV com com título, autor, ISBN, páginas e ano. Ao todo, a lista contém mais de 10 mil livros publicados até o ano de 2021. Base produzida para ser usada como exemplo para criação de APIs web service.

    Fonte: dados de domĂ­nio pĂşblico. Dados foram coletados da base https://www.kaggle.com/datasets/victorstein/livros-skoob criada por VICTOR STEIN (2019) e da plataforma SKOOB https://www.skoob.com.br/.

    Nota: como tive que processar a tabela para usar como exemplo, achei que valia a pena compartilhar os arquivos que gerei, mantendo uma referĂŞncia ao autor original.

  16. Footstep Power Generation Tile

    • kaggle.com
    zip
    Updated Jul 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Atharva Arya (2023). Footstep Power Generation Tile [Dataset]. https://www.kaggle.com/datasets/atharvaarya25/footstep-power-generation
    Explore at:
    zip(1208 bytes)Available download formats
    Dataset updated
    Jul 21, 2023
    Authors
    Atharva Arya
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    Competition: DJ STRIKE 2023

    Awarded: 3rd Place

    Published: Research Paper (ISBN: 978- 93-5768-409-5)

    Proceedings: AVISHKAR 2023

    Other participations: Schneider Electric GO GREEN Competition

    Project Summary

    Developed a piezoelectric tile generating an average of 1mW per step. Data extracted from sensors(Voltage sensor, Current sensor) was sent to Firebase with respect to Exploratory Data Analysis (EDA) for pattern identification. The power stored in the battery could charge a phone for 10 minutes.

    About Dataset

    1. The dataset was manually created over a period of 3 weeks with prototype and product testing. The dataset was made in college with the help of different students from various branches.

    2. It contains 3 numerical features voltage (V), current (mA), and weight (kgs). 1 categorical feature that is the step location of the person on the square tile - center, edge, or corner.

    3. It also contains null values so be sure to find and drop them with your data analysis skills. Null values occurred due to loose connection of sensors or internal tile connection.

  17. Activity Recognition

    • kaggle.com
    zip
    Updated Apr 15, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alex Kudin (2019). Activity Recognition [Dataset]. https://www.kaggle.com/avk256/activity-recognition
    Explore at:
    zip(23170068 bytes)Available download formats
    Dataset updated
    Apr 15, 2019
    Authors
    Alex Kudin
    Description

    Source:

    Uncalibrated Accelerometer Data are collected from 15 participantes performing 7 activities. The dataset provides challenges for identification and authentication of people using motion patterns.

    Data Set Information:

    --- The dataset collects data from a wearable accelerometer mounted on the chest --- Sampling frequency of the accelerometer: 52 Hz --- Accelerometer Data are Uncalibrated --- Number of Participants: 15 --- Number of Activities: 7 --- Data Format: CSV

    Attribute Information:

    --- Data are separated by participant --- Each file contains the following information ---- sequential number, x acceleration, y acceleration, z acceleration, label --- Labels are codified by numbers --- 1: Working at Computer --- 2: Standing Up, Walking and Going updown stairs --- 3: Standing --- 4: Walking --- 5: Going UpDown Stairs --- 6: Walking and Talking with Someone --- 7: Talking while Standing

    Relevant Papers:

    --- Casale, P. Pujol, O. and Radeva, P. 'BeaStreamer-v0.1: a new platform for Multi-Sensors Data Acquisition in Wearable Computing Applications', CVCRD09, ISBN: 978-84-937261-1-9, 2009 available on [Web Link]

    --- Casale, P. Pujol, O. and Radeva, P. 'Human activity recognition from accelerometer data using a wearable device', IbPRIA'11, 289-296, Springer-Verlag, 2011 available on [Web Link]

    --- Casale, P. Pujol, O. and Radeva, P. 'Personalization and user verification in wearable systems using biometric walking patterns' Personal and Ubiquitous Computing, 16(5), 563-580, 2012 available on [Web Link]

    Citation Request:

    Casale, P. Pujol, O. and Radeva, P. 'Personalization and user verification in wearable systems using biometric walking patterns' Personal and Ubiquitous Computing, 16(5), 563-580, 2012

  18. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Goulven Furet (2025). Books dataset, ISBN based [Dataset]. https://www.kaggle.com/datasets/goulvenfuret/books-dataset-isbn-based
Organization logo

Books dataset, ISBN based

A large database of books with titles and attributes, based on the ISBN

Explore at:
zip(367961043 bytes)Available download formats
Dataset updated
Oct 13, 2025
Authors
Goulven Furet
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Nudger is a responsible price comparison tool registered as a public good. This project follows an open-source and open-data approach, featuring open datasets on books and products that are accessible to everyone.

The data shared by Nudger primarily covers the French market.

ISBN Dataset: Contains information on over 6 million books identified by their ISBN numbers.

Nudger is an open and growing project. Feel free to contact us with any questions!

Search
Clear search
Close search
Google apps
Main menu