Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Nudger is a responsible price comparison tool registered as a public good. This project follows an open-source and open-data approach, featuring open datasets on books and products that are accessible to everyone.
The data shared by Nudger primarily covers the French market.
ISBN Dataset: Contains information on over 6 million books identified by their ISBN numbers.
Nudger is an open and growing project. Feel free to contact us with any questions!
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The dataset has been collected in the frame of the Prac1 of the subject Tipology and Data Life Cycle of the Master's Degree in Data Science of the Universitat Oberta de Catalunya (UOC).
The dataset contains 25 variables and 52478 records corresponding to books on the GoodReads Best Books Ever list (the larges list on the site).
Original code used to retrieve the dataset can be found on github repository: github.com/scostap/goodreads_bbe_dataset
The data was retrieved in two sets, the first 30000 books and then the remainig 22478. Dates were not parsed and reformated on the second chunk so publishDate and firstPublishDate are representet in a mm/dd/yyyy format for the first 30000 records and Month Day Year for the rest.
Book cover images can be optionally downloaded from the url in the 'coverImg' field. Python code for doing so and an example can be found on the github repo.
The 25 fields of the dataset are:
| Attributes | Definition | Completeness |
| ------------- | ------------- | ------------- |
| bookId | Book Identifier as in goodreads.com | 100 |
| title | Book title | 100 |
| series | Series Name | 45 |
| author | Book's Author | 100 |
| rating | Global goodreads rating | 100 |
| description | Book's description | 97 |
| language | Book's language | 93 |
| isbn | Book's ISBN | 92 |
| genres | Book's genres | 91 |
| characters | Main characters | 26 |
| bookFormat | Type of binding | 97 |
| edition | Type of edition (ex. Anniversary Edition) | 9 |
| pages | Number of pages | 96 |
| publisher | Editorial | 93 |
| publishDate | publication date | 98 |
| firstPublishDate | Publication date of first edition | 59 |
| awards | List of awards | 20 |
| numRatings | Number of total ratings | 100 |
| ratingsByStars | Number of ratings by stars | 97 |
| likedPercent | Derived field, percent of ratings over 2 starts (as in GoodReads) | 99 |
| setting | Story setting | 22 |
| coverImg | URL to cover image | 99 |
| bbeScore | Score in Best Books Ever list | 100 |
| bbeVotes | Number of votes in Best Books Ever list | 100 |
| price | Book's price (extracted from Iberlibro) | 73 |
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Overview: This dataset comprises information scraped from wonderbk.com, a popular online bookstore. The dataset contains details of 103,063 books, with key attributes such as title, authors, description, category, publisher, starting price, and publish date.
Columns:
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Context: The "Amazon Books Dataset" is a meticulously curated collection of data related to books available on the Amazon platform, with a primary focus on computer science literature. This dataset holds significant importance for researchers, industry experts, and enthusiasts who are interested in gaining profound insights into the world of computer science literature, publishing trends, and the broader book market. Computer science is at the forefront of technological innovation and is a driving force in shaping the modern world. Understanding the dynamics of the Amazon book market, particularly in the realm of computer science, is essential for various stakeholders in the publishing, technology, and retail industries.
Data Source: The data for this dataset has been meticulously sourced and compiled from Amazon's extensive catalog of books, with a specific emphasis on books related to computer science. The information is current as of the dataset's creation date and may continue to evolve as new books are published and existing ones undergo changes in attributes such as ratings and prices.
Content: The "Amazon Books Catalog Dataset" provides a rich and comprehensive array of data points for each book listed on Amazon, with a primary focus on computer science literature. These data points empower users to conduct in-depth analyses and uncover valuable insights. Here are the key columns included in this dataset:
Potential Use Cases: This dataset, with its focus on computer science literature, presents a unique opportunity for research and analysis in fields such as:
Researchers, analysts, and computer science enthusiasts can leverage this dataset to explore the intricate world of computer science literature, uncover market trends, and contribute to a deeper understanding of the evolving landscape of technology and knowledge dissemination on one of the world's largest online marketplaces.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The all.zip CSV file (zipped) contains citation counts obtained from the November 2018 dump of COCI (https://doi.org/10.6084/m9.figshare.6741422.v3) and some metadata (title, DOI, number of authors, ISBN, ISBN of the container, type of the bibliographic resource) of the related citing and cited entities obtained by using the Crossref dump downloaded in October 2018 – which is the same dump used to create the COCI data.
In addition, it contains all the Library of Congress Classification (LCC) categories associated with each ISBN in the previous dataset (file isbn_cat_lcc.csv), according to the data retrieved using the services at http://classify.oclc.org/classify2/api_docs/index.html. Two ancillary mapping files have been also added: one (ddc_to_lcc_mapping.csv) for converting a Dewey Decimal Classification (DDC) categories into LCC categories, in the case the service mentioned above returned only DDC categories for some ISBN; the other (lcc_to_wos_mapping.csv) to map each LCC category into the related Web of Science research area.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
A few months ago, I started working on a side project that allows users to search for books by taking a picture of a book cover. The main barrier at the beginning was data quality of book cover datasets available online, so I created this one.
For example, this dataset can be used for building recommendation and Content Based Image Retrieval (CBIR) systems.
This CSV file contains all meta information for each book in the dataset.
In this folder you can find all book covers, sorted in category based folders, in the .jpg format. This dataset is contains 33 classes (book categories) and each contains close to 1k images, so it is pretty balanced. NOTE: Extract this data into folder called dataset so it matches paths provided for you in the main_dataset.csv file.
All data found in this dataset was scraped from the https://www.bookdepository.com/. (Not related to them in any way, just a great website 👍 )-
Facebook
TwitterThis dataset contains longitudinal purchases data from 5027 Amazon.com users in the US, spanning 2018 through 2022: amazon-purchases.csv It also includes demographic data and other consumer level variables for each user with data in the dataset. These consumer level variables were collected through an online survey and are included in survey.csv fields.csv describes the columns in the survey.csv file, where fields/survey columns correspond to survey questions. The dataset also contains the survey instrument used to collect the data. More details about the survey questions and possible responses, and the format in which they were presented can be found by viewing the survey instrument. A 'Survey ResponseID' column is present in both the amazon-purchases.csv and survey.csv files. It links a user's survey responses to their Amazon.com purchases. The 'Survey ResponseID' was randomly generated at the time of data collection. amazon-purchases.csv Each row in this file corresponds to an Amazon order. Each such row has the following columns: Survey ResponseID Order date Shipping address state Purchase price per unit Quantity ASIN/ISBN (Product Code) Title Category The data were exported by the Amazon users from Amazon.com and shared by users with their informed consent. PII and other information not listed above were stripped from the data. This processing occurred on users' machines before sharing with researchers.
Facebook
TwitterBigBox API provides reliable, real-time Home Depot product, category, reviews, and offers data. All data includes comprehensive coverage of each of the search results in a cleanly structured output.
You can originate your request from any zip code (US) to see results as they would appear to customers in the specified location i.e. shipping info. BigBox APIs high-capacity, global infrastructure assures you the highest level of performance and reliability. For easy integration with your Home Depot data apps and services, data is delivered in JSON or CSV format.
Data is retrieved by search term, search results page URL, or for single products, by the Home Depot item ID or by global identifiers such as GTIN, ISBN, UPC and EAN. GTIN-based requests work by looking up the GTIN/ISBN/UPC on Home Depot first, then retrieving the product details for the first matching item ID.
So what's in the data from BigBox API?
Product: - Item & parent ID - UPC - Store SKU - In-store bay &/or aisle - Product specifications - Description - Imagery - Product videos - Buy Box winner: price and fulfillment info - Rating & reviews count - Descriptive attributes
Search results: - Product details per search result: - Position - Related queries - Pagination - Facets
How can BigBox API be used? - Product listing management - Price monitoring - Category & product trends monitoring - Market research & competitor intelligence - Location-specific shipping data - Rank tracking on Home Depot
...and more, depending on your request parameters or the search result.
Who uses BigBox API? This data is leveraged by software developers, marketers & business owners, sales & business development teams, researchers, and data analysts & engineers, in ecommerce, other retail business, agencies and SaaS platforms.
Anyone in your organization who works with your digital presence can develop business intelligence and strategy using this advanced product data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The LKD_raw dataset is an outcome of the SciGRID_gas project
The data set contains geographical and meta information on the European gas transport network. The data originats from gas data of LKD_EU (ISBN: 978-3-86780-554-4) project. The original data repository can be found under 10.5281/zenodo.1044462.
The original data has been partially cleaned up and converted to fit to the SciGRID_gas project data structure.
The data is being stored in both CSV and GeoJSON files.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By Leigh Dodds [source]
The dataset offers insights into various literary works that take place in Bath, providing an opportunity for readers and researchers to explore the rich connections between literature and this historical city. Whether you are interested in local stories or looking for inspiration for your next visit to Bath, this dataset serves as a useful resource.
Each entry includes detailed information such as the unique identifier assigned by LibraryThing (URI), which allows users to access further metadata and book covers using LibraryThing's APIs. Additionally, if available, ISBNs are provided for easy identification of specific editions or versions of each book.
With columns formatted consistently as uri,**uri,title,**title,isbn,**isbn,and author,**author,the dataset ensures clarity and enables efficient data analysis.
Dataset Overview
Columns
This dataset consists of eight columns that provide important details about each book:
- uri: The unique identifier for each book in the LibraryThing database.
- title: The title of the book.
- isbn: The International Standard Book Number (ISBN) for the book if known.
- author: The author of the book.
Getting Started
Before diving into analyzing or exploring this dataset, it's important to understand its structure and familiarize yourself with its columns and values.
To get started:
- Load/import it into your preferred data analysis tool or programming language (e.g., Python pandas library).
- Follow along with code examples provided below for common tasks using pandas library.
Example Code: Getting Basic Insights
import pandas as pd # Load CSV file into pandas DataFrame data = pd.read_csv('Library_Thing_Books_Set_in_Bath.csv') # Print basic insights about columns and values print(Number of rows:, data.shape[0]) print(Number of columns:, data.shape[1]) print( Column names:, list(data.columns)) print( Sample data:) print(data.head())Exploring the Data
Once you have loaded the dataset into your preferred tool, you can begin exploring and analyzing its contents. Here are a few common tasks to get you started:
1. Checking Unique Book Count:
unique_books = data['title'].nunique() print(Number of unique books:, unique_books)2. Finding Books by a Specific Author:
author_name = Jane Austen books_by_author = data[data['author'] == author
If you use this dataset in your research, please credit the original authors. Data Source
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: Library_Thing_Books_Set_in_Bath.csv | Column name | Description | |:--------------|:-----------------------------------------------------------------------------------------------------------------------| | uri | The unique identifier for each book in the dataset. (String) | | title | The title of the book. (String) | | isbn | The International Standard Book Number (ISBN) for the book, which is a unique identifier for published books. (String) | | author | The author of the book. (String) |
If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Leigh Dodds.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Abstract: BlogCatalog is the social blog directory which manages the bloggers and their blogs.Number of Nodes:10,312Number of Edges:333,983Missing Values?noSource:Nitin Agarwal+, Xufei Wang*, Huan Liu*+ Department of Information Science, University of Arkansas at Little Rock. E-mail:nxagarwal@ualr.edu* School of Computing, Informatics and Decision Systems Engineering, Arizona State University. E-mail: huan.liu@asu.edu, xufei.wang@asu.eduData Set Information:2 files are included:1. nodes.csv-- it's the file of all the users. This file works as a dictionary of all the users in this data set. It's useful for fast reference. It contains all the node ids used in the dataset.2. edges.csv-- this is the friendship network among the bloggers. The blogger's friends are represented using edges. Here is an example.1,2This means blogger with id "1" is friend with blogger id "2".Attribute Information:This is the data set crawled on July, 2009 from BlogCatalog ( http://www.blogcatalog.com ). BlogCatalog is a social blog directory website. This contains the friendship network crawled. For easier understanding, all the contents are organized in CSV file format.-. Basic statisticsNumber of bloggers : 88,784Number of friendship pairs: 4,186,390Relevant Papers:Nitin Agarwal and Huan Liu. ”Modeling and Data Mining in Blogosphere”, Synthesis Lectures on Data Mining and Knowledge Discovery #1, Morgan & Claypool Publishers, Robert Grossman (Editor), August 2009. ISBN: 9781598299083 (paperback) ISBN: 9781598299090 (ebook) Nitin Agarwal, Magdiel Galan, Huan Liu, and Shankar Subramanya. WisColl: Collective Wisdom based Blog Clustering. Journal of Information Science, 180(1): 39-61, January, 2010. Nitin Agarwal, Huan Liu, Sudheendra Murthy, Arunabha Sen, and Xufei Wang. A Social Identity Approach to Identify Familiar Strangers in a Social Network. In Proceedings of the Third International AAAI Conference on Weblogs and Social Media (ICWSM09), pp. 2 - 9, May 17-20, 2009. San Jose, California. Nitin Agarwal, Huan Liu, Sudheendra Murthy, Arunabha Sen, and Xufei Wang. "A Social Identity Approach to Identify Familiar Strangers in a Social Network", 3rd International AAAI Conference on Weblogs and Social Media (ICWSM09), pp. 2 - 9, May 17-20, 2009. San Jose, California.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset consists of over 20,000 books available on Goodreads. The dataset is collected by crawling information directly from official Goodreads website.
Since December 2020, Goodreads no longer issues new developer keys for public developer API. Therefore, goodreads crawler project has been created to retrieve raw data from the website.
Note - Raw data alert: - Duplicates - Missing values - Invalid values - Multi-values columns - Datetime formats
Features: - bookID: book's ID - title: book's title - authors: list of authors - description: summary description - num_ratings: total number of ratings - num_reviews: total number of reviews - avg_rating: average rating - language: languages - publish date: current book's published date - first_publish_date: The published date of first version - series: book's series - characters: list of characters - places: book's places - awards: List of winning awards - genres: list of book's genres - isbn: International Standard Book Number - isbn13: International Standard Book Number (13 digits) - rated 5, 4, 3, 2, 1: Number of rated reviews
Please upvote if you found this dataset is useful
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The Goodreads Book Reviews dataset encapsulates a wealth of reviews and various attributes concerning the books listed on the Goodreads platform. A distinguishing feature of this dataset is its capture of multiple tiers of user interaction, ranging from adding a book to a "shelf", to rating and reading it. This dataset is a treasure trove for those interested in understanding user behavior, book recommendations, sentiment analysis, and the interplay between various attributes of books and user interactions.
Basic Statistics: - Items: 1,561,465 - Users: 808,749 - Interactions: 225,394,930
Metadata: - Reviews: The text of the reviews provided by users. - Add-to-shelf, Read, Review Actions: Various interactions users have with the books. - Book Attributes: Attributes describing the books including title, and ISBN. - Graph of Similar Books: A graph depicting similarity relations between books.
Example (interaction data):
json
{
"user_id": "8842281e1d1347389f2ab93d60773d4d",
"book_id": "130580",
"review_id": "330f9c153c8d3347eb914c06b89c94da",
"isRead": true,
"rating": 4,
"date_added": "Mon Aug 01 13:41:57 -0700 2011",
"date_updated": "Mon Aug 01 13:42:41 -0700 2011",
"read_at": "Fri Jan 01 00:00:00 -0800 1988",
"started_at": ""
}
Use Cases: - Book Recommendations: Creating personalized book recommendations based on user interactions and preferences. - Sentiment Analysis: Analyzing sentiment in reviews and understanding how different book attributes influence sentiment. - User Behavior Analysis: Understanding user interaction patterns with books and deriving insights to enhance user engagement. - Natural Language Processing: Training models to process and analyze user-generated text in reviews. - Similarity Analysis: Analyzing the graph of similar books to understand book similarities and clustering.
Citation:
Please cite the following if you use the data:
Item recommendation on monotonic behavior chains
Mengting Wan, Julian McAuley
RecSys, 2018
[PDF](https://cseweb.ucsd.edu/~jmcauley/pdfs/recsys18e.pdf)
Code Samples: A curated set of code samples is provided in the dataset's Github repository, aiding in seamless interaction with the datasets. These include: - Downloading datasets without GUI: Facilitating dataset download in a non-GUI environment. - Displaying Sample Records: Showcasing sample records to get a glimpse of the dataset structure. - Calculating Basic Statistics: Computing basic statistics to understand the dataset's distribution and characteristics. - Exploring the Interaction Data: Delving into interaction data to grasp user-book interaction patterns. - Exploring the Review Data: Analyzing review data to extract valuable insights from user reviews.
Additional Dataset: - Complete book reviews (~15m multilingual reviews about ~2m books and 465k users): This dataset comprises a comprehensive collection of reviews, showcasing a multilingual facet with reviews about around 2 million books from 465,000 users.
Datasets:
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description: This dataset provides a comprehensive collection of data from GoodReads, a popular platform for readers to discover and review books. It includes information on books, book details, and user reviews, offering insights into the literary world.
Content: - Books: The dataset contains information such as book titles, authors, genres, publication dates, and ISBN numbers. Additionally, it includes details about book collections, including the total number of books in each collection and the total votes received for the collection. - Book Details: Supplementary details about the books, such as summaries, cover images, formats, publication info, and the number of pages. - User Reviews: User-generated reviews and ratings for the books, along with information about the reviewers, such as their usernames, follower counts, total reviews written, review dates, and review ratings.
Size: The dataset is approximately 205MB in size.
Format: The dataset files are stored in SQLite(db) and csv format.
Date Range: The dataset covers the period of May 2024.
Source: The data was collected from GoodReads, a leading platform for book lovers to explore, review, and discuss literature.
Purpose: The dataset serves various purposes, including analysis, research, and application in Natural Language Processing (NLP), Deep Learning (DL), and image generation tasks. It can be used to analyze trends in book genres, study user preferences, build recommendation systems, and perform sentiment analysis on user reviews.
Usage: Researchers, data scientists, and enthusiasts can leverage this dataset to gain insights into the literary landscape, understand readers' preferences, and develop novel applications in the field of literature analysis and recommendation systems. Potential use cases include analyzing book trends over time, predicting user preferences based on reviews, and generating book cover images using Deep Learning techniques.
License: This dataset is released under the MIT License, allowing users to freely use, modify, and distribute the data for both commercial and non-commercial purposes.
Example Entries: - Books: - Book Title: "To Kill a Mockingbird" Author: Harper Lee Genre: Fiction Publication Date: July 11, 1960 Collection Title: Classic Novels Total Books in Collection: 100 Total Votes for Collection: 5000
Book Title: "1984" Author: George Orwell Genre: Dystopian Fiction Publication Date: June 8, 1949 Collection Title: Modern Classics Total Books in Collection: 75 Total Votes for Collection: 3500
Book Details:
User Reviews:
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Lista de livros em formato JSON e CSV com com tĂtulo, autor, ISBN, páginas e ano. Ao todo, a lista contĂ©m mais de 10 mil livros publicados atĂ© o ano de 2021. Base produzida para ser usada como exemplo para criação de APIs web service.
Fonte: dados de domĂnio pĂşblico. Dados foram coletados da base https://www.kaggle.com/datasets/victorstein/livros-skoob criada por VICTOR STEIN (2019) e da plataforma SKOOB https://www.skoob.com.br/.
Nota: como tive que processar a tabela para usar como exemplo, achei que valia a pena compartilhar os arquivos que gerei, mantendo uma referĂŞncia ao autor original.
Facebook
TwitterOpen Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Competition: DJ STRIKE 2023
Awarded: 3rd Place
Published: Research Paper (ISBN: 978- 93-5768-409-5)
Proceedings: AVISHKAR 2023
Other participations: Schneider Electric GO GREEN Competition
Developed a piezoelectric tile generating an average of 1mW per step. Data extracted from sensors(Voltage sensor, Current sensor) was sent to Firebase with respect to Exploratory Data Analysis (EDA) for pattern identification. The power stored in the battery could charge a phone for 10 minutes.
The dataset was manually created over a period of 3 weeks with prototype and product testing. The dataset was made in college with the help of different students from various branches.
It contains 3 numerical features voltage (V), current (mA), and weight (kgs). 1 categorical feature that is the step location of the person on the square tile - center, edge, or corner.
It also contains null values so be sure to find and drop them with your data analysis skills. Null values occurred due to loose connection of sensors or internal tile connection.
Facebook
TwitterSource:
Uncalibrated Accelerometer Data are collected from 15 participantes performing 7 activities. The dataset provides challenges for identification and authentication of people using motion patterns.
Data Set Information:
--- The dataset collects data from a wearable accelerometer mounted on the chest --- Sampling frequency of the accelerometer: 52 Hz --- Accelerometer Data are Uncalibrated --- Number of Participants: 15 --- Number of Activities: 7 --- Data Format: CSV
Attribute Information:
--- Data are separated by participant --- Each file contains the following information ---- sequential number, x acceleration, y acceleration, z acceleration, label --- Labels are codified by numbers --- 1: Working at Computer --- 2: Standing Up, Walking and Going updown stairs --- 3: Standing --- 4: Walking --- 5: Going UpDown Stairs --- 6: Walking and Talking with Someone --- 7: Talking while Standing
Relevant Papers:
--- Casale, P. Pujol, O. and Radeva, P. 'BeaStreamer-v0.1: a new platform for Multi-Sensors Data Acquisition in Wearable Computing Applications', CVCRD09, ISBN: 978-84-937261-1-9, 2009 available on [Web Link]
--- Casale, P. Pujol, O. and Radeva, P. 'Human activity recognition from accelerometer data using a wearable device', IbPRIA'11, 289-296, Springer-Verlag, 2011 available on [Web Link]
--- Casale, P. Pujol, O. and Radeva, P. 'Personalization and user verification in wearable systems using biometric walking patterns' Personal and Ubiquitous Computing, 16(5), 563-580, 2012 available on [Web Link]
Citation Request:
Casale, P. Pujol, O. and Radeva, P. 'Personalization and user verification in wearable systems using biometric walking patterns' Personal and Ubiquitous Computing, 16(5), 563-580, 2012
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Nudger is a responsible price comparison tool registered as a public good. This project follows an open-source and open-data approach, featuring open datasets on books and products that are accessible to everyone.
The data shared by Nudger primarily covers the French market.
ISBN Dataset: Contains information on over 6 million books identified by their ISBN numbers.
Nudger is an open and growing project. Feel free to contact us with any questions!