53 datasets found

Books Dataset
kaggle.com
zip
Updated Dec 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Elvin Rustamov (2023). Books Dataset [Dataset]. https://www.kaggle.com/datasets/elvinrustam/books-dataset
Explore at:
zip(55469565 bytes)Available download formats
Dataset updated
Dec 20, 2023
Authors
Elvin Rustamov
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Overview: This dataset comprises information scraped from wonderbk.com, a popular online bookstore. The dataset contains details of 103,063 books, with key attributes such as title, authors, description, category, publisher, starting price, and publish date.

Columns:

Title: The title of the book.

Authors: The authors of the book.

Description: A brief description of the book.

Category: The category or genre to which the book belongs.

Publisher: The publishing house responsible for the book.

Price Starting With ($): The initial price of the book.

Publish Date (Month): The month in which the book was published.

Publish Date (Year): The year of publication.
Best Books Ever Dataset
zenodo.org
csv
Updated Nov 10, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lorena Casanova Lozano; Sergio Costa Planells; Lorena Casanova Lozano; Sergio Costa Planells (2020). Best Books Ever Dataset [Dataset]. http://doi.org/10.5281/zenodo.4265096
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4265096
Dataset updated
Nov 10, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Lorena Casanova Lozano; Sergio Costa Planells; Lorena Casanova Lozano; Sergio Costa Planells
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
The dataset has been collected in the frame of the Prac1 of the subject Tipology and Data Life Cycle of the Master's Degree in Data Science of the Universitat Oberta de Catalunya (UOC).

The dataset contains 25 variables and 52478 records corresponding to books on the GoodReads Best Books Ever list (the larges list on the site).

Original code used to retrieve the dataset can be found on github repository: github.com/scostap/goodreads_bbe_dataset

The data was retrieved in two sets, the first 30000 books and then the remainig 22478. Dates were not parsed and reformated on the second chunk so publishDate and firstPublishDate are representet in a mm/dd/yyyy format for the first 30000 records and Month Day Year for the rest.

Book cover images can be optionally downloaded from the url in the 'coverImg' field. Python code for doing so and an example can be found on the github repo.

The 25 fields of the dataset are:

| Attributes | Definition | Completeness | | ------------- | ------------- | ------------- | | bookId | Book Identifier as in goodreads.com | 100 | | title | Book title | 100 | | series | Series Name | 45 | | author | Book's Author | 100 | | rating | Global goodreads rating | 100 | | description | Book's description | 97 | | language | Book's language | 93 | | isbn | Book's ISBN | 92 | | genres | Book's genres | 91 | | characters | Main characters | 26 | | bookFormat | Type of binding | 97 | | edition | Type of edition (ex. Anniversary Edition) | 9 | | pages | Number of pages | 96 | | publisher | Editorial | 93 | | publishDate | publication date | 98 | | firstPublishDate | Publication date of first edition | 59 | | awards | List of awards | 20 | | numRatings | Number of total ratings | 100 | | ratingsByStars | Number of ratings by stars | 97 | | likedPercent | Derived field, percent of ratings over 2 starts (as in GoodReads) | 99 | | setting | Story setting | 22 | | coverImg | URL to cover image | 99 | | bbeScore | Score in Best Books Ever list | 100 | | bbeVotes | Number of votes in Best Books Ever list | 100 | | price | Book's price (extracted from Iberlibro) | 73 |
n
Bookplate Registry Database
curate.nd.edu
bin
Updated Dec 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rare Books & Special Collections (2023). Bookplate Registry Database [Dataset]. http://doi.org/10.7274/r0-yq4p-t907
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.7274/r0-yq4p-t907
Dataset updated
Dec 15, 2023
Dataset provided by
University of Notre Dame
Authors
Rare Books & Special Collections
License
https://www.law.cornell.edu/uscode/text/17/106https://www.law.cornell.edu/uscode/text/17/106
Description
The bookplate registry database focuses on the bookplates that are pasted into the front matter of a book to show ownership. The bookplate registry is a searchable database image catalog of approximately 1100 sample bookplates and library stamps from Hesburgh Libraries Rare Books and Special Collections at the University of Notre Dame. The database was created during preliminary explorations of the cataloging and database methodology necessary to support a cooperative online bookplate registry for multiple universities. The database focuses on both the owners of the books as well as the artists that created the bookplate designs. The attached files include a powerpoint presenation given by Christian Dupont that was given at the 41st Annual Preconference of the Rare Books and Manuscripts Section of the American Library Association in Chicago, Illinois on July 7, 2000. The presentation explains the project in more detail and the data that was collected. The dataset gives information on the bookplates that were reviewed at the University of Notre Dame Hesburgh Libraries Rare Book and Special Collections. The original site that this information was searchable on was retired in the Fall of 2021.
SQLite Sakila Sample Database
kaggle.com
zip
Updated Mar 14, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Atanas Kanev (2021). SQLite Sakila Sample Database [Dataset]. https://www.kaggle.com/datasets/atanaskanev/sqlite-sakila-sample-database/code
Explore at:
zip(4495190 bytes)Available download formats
Dataset updated
Mar 14, 2021
Authors
Atanas Kanev
Description
SQLite Sakila Sample Database

Database Description

The Sakila sample database is a fictitious database designed to represent a DVD rental store. The tables of the database include film, film_category, actor, customer, rental, payment and inventory among others. The Sakila sample database is intended to provide a standard schema that can be used for examples in books, tutorials, articles, samples, and so forth. Detailed information about the database can be found on the MySQL website: https://dev.mysql.com/doc/sakila/en/

Sakila for SQLite is a part of the sakila-sample-database-ports project intended to provide ported versions of the original MySQL database for other database systems, including:

Oracle

SQL Server

SQLIte

Interbase/Firebird

Microsoft Access

Sakila for SQLite is a port of the Sakila example database available for MySQL, which was originally developed by Mike Hillyer of the MySQL AB documentation team. This project is designed to help database administrators to decide which database to use for development of new products The user can run the same SQL against different kind of databases and compare the performance

License: BSD Copyright DB Software Laboratory http://www.etl-tools.com

Note: Part of the insert scripts were generated by Advanced ETL Processor http://www.etl-tools.com/etl-tools/advanced-etl-processor-enterprise/overview.html

Information about the project and the downloadable files can be found at: https://code.google.com/archive/p/sakila-sample-database-ports/

Other versions and developments of the project can be found at: https://github.com/ivanceras/sakila/tree/master/sqlite-sakila-db

https://github.com/jOOQ/jOOQ/tree/main/jOOQ-examples/Sakila

Direct access to the MySQL Sakila database, which does not require installation of MySQL (queries can be typed directly in the browser), is provided on the phpMyAdmin demo version website: https://demo.phpmyadmin.net/master-config/

Files Description

The files in the sqlite-sakila-db folder are the script files which can be used to generate the SQLite version of the database. For convenience, the script files have already been run in cmd to generate the sqlite-sakila.db file, as follows:

sqlite> .open sqlite-sakila.db # creates the .db file sqlite> .read sqlite-sakila-schema.sql # creates the database schema sqlite> .read sqlite-sakila-insert-data.sql # inserts the data

Therefore, the sqlite-sakila.db file can be directly loaded into SQLite3 and queries can be directly executed. You can refer to my notebook for an overview of the database and a demonstration of SQL queries. Note: Data about the film_text table is not provided in the script files, thus the film_text table is empty. Instead the film_id, title and description fields are included in the film table. Moreover, the Sakila Sample Database has many versions, so an Entity Relationship Diagram (ERD) is provided to describe this specific version. You are advised to refer to the ERD to familiarise yourself with the structure of the database.
n
Comprehensive Drug Self-administration and Discrimination Bibliographic...
neuinfo.org
scicrunch.org
+2more
Updated Jan 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Comprehensive Drug Self-administration and Discrimination Bibliographic Databases [Dataset]. http://identifiers.org/RRID:SCR_000707
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_000707
Dataset updated
Jan 29, 2022
Description
Database of bibliographic details of over 9,000 references published between 1951 and the present day, and includes abstracts, journal articles, book chapters and books replacing the two former separate websites for Ian Stolerman's drug discrimination database and Dick Meisch's drug self-administration database. Lists of standardized keywords are used to index the citations. Most of the keywords are generic drug names but they also include methodological terms, species studied and drug classes. This index makes it possible to selectively retrieve references according to the drugs used as the training stimuli, drugs used as test stimuli, drugs used as pretreatments, species, etc. by entering your own terms or by using our comprehensive lists of search terms. Drug Discrimination Drug Discrimination is widely recognized as one of the major methods for studying the behavioral and neuropharmacological effects of drugs and plays an important role in drug discovery and investigations of drug abuse. In Drug Discrimination studies, effects of drugs serve as discriminative stimuli that indicate how reinforcers (e.g. food pellets) can be obtained. For example, animals can be trained to press one of two levers to obtain food after receiving injections of a drug, and to press the other lever to obtain food after injections of the vehicle. After the discrimination has been learned, the animal starts pressing the appropriate lever according to whether it has received the training drug or vehicle; accuracy is very good in most experiments (90 or more correct). Discriminative stimulus effects of drugs are readily distinguished from the effects of food alone by collecting data in brief test sessions where responses are not differentially reinforced. Thus, trained subjects can be used to determine whether test substances are identified as like or unlike the drug used for training. Drug Self-administration Drug Self-administration methodology is central to the experimental analysis of drug abuse and dependence (addiction). It constitutes a key technique in numerous investigations of drug intake and its neurobiological basis and has even been described by some as the gold standard among methods in the area. Self-administration occurs when, after a behavioral act or chain of acts, a feedback loop results in the introduction of a drug or drugs into a human or infra-human subject. The drug is usually conceptualized as serving the role of a positive reinforcer within a framework of operant conditioning. For example, animals can be given the opportunity to press a lever to obtain an infusion of a drug through a chronically-indwelling venous catheter. If the available dose of the drug serves as a positive reinforcer then the rate of lever-pressing will increase and a sustained pattern of responding at a high rate may develop. Reinforcing effects of drugs are distinguishable from other actions such as increases in general activity by means of one or more control procedures. Trained subjects can be used to investigate the behavioral and neuropharmacological basis of drug-taking and drug-seeking behaviors and the reinstatement of these behaviors in subjects with a previous history of drug intake (relapse models). Other applications include evaluating novel compounds for liability to produce abuse and dependence and for their value in the treatment of drug dependence and addiction. The bibliography is updated about four times per year.
Books dataset, ISBN based
kaggle.com
zip
Updated Oct 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Goulven Furet (2025). Books dataset, ISBN based [Dataset]. https://www.kaggle.com/datasets/goulvenfuret/books-dataset-isbn-based
Explore at:
zip(367961043 bytes)Available download formats
Dataset updated
Oct 13, 2025
Authors
Goulven Furet
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Nudger is a responsible price comparison tool registered as a public good. This project follows an open-source and open-data approach, featuring open datasets on books and products that are accessible to everyone.

The data shared by Nudger primarily covers the French market.

ISBN Dataset: Contains information on over 6 million books identified by their ISBN numbers.

Nudger is an open and growing project. Feel free to contact us with any questions!
Text files from Gutenberg database
zenodo.org
zip
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Antonis Michalas; Antonis Michalas (2020). Text files from Gutenberg database [Dataset]. http://doi.org/10.5281/zenodo.3360392
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3360392
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Antonis Michalas; Antonis Michalas
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Text files of different size and structure. More precisely, we selected random data from the Gutenberg dataset.

This artefact contains five different datasets with random text files (i.e. e-books in .txt format) from the Gutenberg database. The datasets that we selected ranged from text files with a total size of 184MB to a set of text files with a total size of 1.7GB.

More precisely, the following datasets can be found in this package:

184MB

357MB

670MB

1GB

1.7GB

In our case, we used this dataset to perform extensive experiments on regarding the performance of a Symmetric Searchable Encryption scheme. However, this dataset can be used to measure the performance of any algorithm that is parsing documents, extracting keywords, creates dictionaries etc.
u
Goodreads Book Reviews
cseweb.ucsd.edu
json
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UCSD CSE Research Project, Goodreads Book Reviews [Dataset]. https://cseweb.ucsd.edu/~jmcauley/datasets.html
Explore at:
jsonAvailable download formats
Dataset authored and provided by
UCSD CSE Research Project
Description
These datasets contain reviews from the Goodreads book review website, and a variety of attributes describing the items. Critically, these datasets have multiple levels of user interaction, raging from adding to a shelf, rating, and reading.

Metadata includes

reviews

add-to-shelf, read, review actions

book attributes: title, isbn

graph of similar books

Basic Statistics:

Items: 1,561,465

Users: 808,749

Interactions: 225,394,930
Using pseudoalignment and base quality to accurately quantify microbial...
plos.figshare.com
pdf
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mark Reppell; John Novembre (2023). Using pseudoalignment and base quality to accurately quantify microbial community composition [Dataset]. http://doi.org/10.1371/journal.pcbi.1006096
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pcbi.1006096
Dataset updated
Jun 1, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Mark Reppell; John Novembre
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Pooled DNA from multiple unknown organisms arises in a variety of contexts, for example microbial samples from ecological or human health research. Determining the composition of pooled samples can be difficult, especially at the scale of modern sequencing data and reference databases. Here we propose a novel method for taxonomic profiling in pooled DNA that combines the speed and low-memory requirements of k-mer based pseudoalignment with a likelihood framework that uses base quality information to better resolve multiply mapped reads. We apply the method to the problem of classifying 16S rRNA reads using a reference database of known organisms, a common challenge in microbiome research. Using simulations, we show the method is accurate across a variety of read lengths, with different length reference sequences, at different sample depths, and when samples contain reads originating from organisms absent from the reference. We also assess performance in real 16S data, where we reanalyze previous genetic association data to show our method discovers a larger number of quantitative trait associations than other widely used methods. We implement our method in the software Karp, for k-mer based analysis of read pools, to provide a novel combination of speed and accuracy that is uniquely suited for enhancing discoveries in microbial studies.
d
Data from: USGS North American Packrat Midden Database, Version 5.0
catalog.data.gov
data.usgs.gov
+3more
Updated Nov 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). USGS North American Packrat Midden Database, Version 5.0 [Dataset]. https://catalog.data.gov/dataset/usgs-north-american-packrat-midden-database-version-5-0
Explore at:
Dataset updated
Nov 21, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Description
This data release contains the data tables for the USGS North American Packrat Midden Database (version 5.0). This version of the Midden Database contains data for 3,331 packrat midden samples obtained from published sources (journal articles, book chapters, theses, dissertations, government and private industry reports, conference proceedings) as well as unpublished data contributed by researchers. Compared to the previous version of the Midden Database (i.e., ver. 4), this version of the database (ver. 5.0) has been expanded to include more precise midden-sample site location data, calibrated midden-sample age data, and plant functional type (PFT) assignments for the taxa in each midden sample. In addition, World Wildlife Fund ecoregion and major habitat type (MHT) assignments (Ricketts and others, 1999, Terrestrial ecoregions of North America—A conservation assessment) and modern climate and bioclimate data (New and others, 2002; Davis and others, 2017) are provided for each midden-sample site location.
f
Percentage errors for popularity change predictions in Twitter database.
figshare.com
xls
Updated Jun 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anna Tovo; Samuele Stivanello; Amos Maritan; Samir Suweis; Stefano Favaro; Marco Formentin (2023). Percentage errors for popularity change predictions in Twitter database. [Dataset]. http://doi.org/10.1371/journal.pone.0253461.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0253461.t002
Dataset updated
Jun 6, 2023
Dataset provided by
PLOS ONE
Authors
Anna Tovo; Samuele Stivanello; Amos Maritan; Samir Suweis; Stefano Favaro; Marco Formentin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
For a fixed L = 25 and different values of K (first and second column), we estimated, from ten different Twitter sub-samples (p* = 5%), the number of species having abundance at least K at the unobserved scale 1 − p* = 95% given that they have abundance at least L at the sampled scale p* via estimator (4). The average among the ten sub-samples of the true numbers of species, S1−p*(≥K|≥L), and of the ones predicted by our method, , among the ten sub-samples are displayed in the third and fourth columns, respectively. Finally, in the last two columns, we inserted the mean and the variance of the relative error obtained in the ten predictions. Similar results have been obtained for other values of L and K (see S4 Table in S1 Appendix).
s
Alcohol and Alcohol Problems Science Database
scicrunch.org
rrid.site
+1more
Updated Dec 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Alcohol and Alcohol Problems Science Database [Dataset]. http://identifiers.org/RRID:SCR_003768
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_003768
Dataset updated
Dec 4, 2023
Description
Portal to support researchers and practitioners searching for information related to alcohol research including links to a number of databases, journals, and Web sites focused on alcohol research and related topics. Also included is a link to the archived ETOH database, the premier Alcohol and Alcohol Problems Science Database, which contains over 130,000 records and covers the period from 1972 through 2003. Included in ETOH are abstracts and bibliographic references to journal articles, books, dissertation abstracts, conference papers and proceedings, reports and studies, and chapters in edited works. ETOH's scope reflects the multidisciplinary nature of the alcohol research field. The range of subject areas contained in ETOH includes: medicine, biochemistry, psychology, psychiatry, epidemiology, sociology, anthropology, treatment, prevention, education, accidents and safety, legislation, criminal justice, public policy, and health services research. The ETOH database is indexed with vocabulary from the Alcohol and Other Drug Thesaurus: A Guide to Concepts and Terminology in Substance Abuse and Addiction (AOD Thesaurus), Third Edition. More than 5,000 terms in the AOD Thesaurus are used as ETOH descriptors. The Databases/Resources section includes databases and resources for alcohol researchers and practitioners. It includes an introduction to the National Library of Medicine's PubMed and some sample searches on alcohol to run in the PubMed database; descriptions of and links to the various databases of the National Clearinghouse for Alcohol and Drug Information (NCADI); a selection of alcohol and other drug databases with their descriptions and links; links to peer-reviewed journals most often used by alcohol researchers; and links to a selection of Web sites pertinent to the substance abuse field.
t
MARHYS Database 2.0 - Vdataset - LDM
service.tib.eu
Updated Nov 29, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). MARHYS Database 2.0 - Vdataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/png-doi-10-1594-pangaea-935649
Explore at:
Dataset updated
Nov 29, 2024
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is version 2.0 of MARHYS (MARine HYdrothermal Solutions) database. The database is a global compilation of marine hydrothermal vent fluid compositions. The database includes data presented in journal publications, book chapters and monographs and includes compositions of hydrothermal fluids, end members and background seawater. Detailed sample information (metadata) is provided along with the chemical composition, which enables the unique identification of discrete vent fluid samples. The database contains concentrations of major-, minor- and trace elements, as well as, reduced carbon compounds and isotope compositions. All concentrations are converted into units per kg of vent fluid and chemical data of unique samples reported in different publications are merged into unique sample entries. In brief, the following changes have been applied in the course of compilation of MARHYS database version 2.0 compared to version 1.0: A large number of new vent sites from the following regions is now incorporated in the database: southern Mid Atlantic Ridge, Arctic Mid Ocean Ridge, Juan de Fuca Ridge, Okinawa Trough, Lau Basin, Manus Basin, and some more (The database has grown about nearly 50% and now contains 3394 merged entries, compared to 2374 merged entries in Version 1.0). The complete sample information section for a larger number of entries was overhauled to achieve more consistency between different datasets and within individual vent sites (additional Sample IDs have been adjusted to fit the database naming scheme and to ensure unique sample entries). Some new species were included. A small number of faulty concentrations were detected and corrected. A number of vent areas in the Manus Basin were erroneously classified in Version 1.0 and are now re-classified as back-arc spreading centres (Susu, Desmos, PACMANUS). The shape of the spreadsheet was improved by freezing the "Sample ID" column, as well as, the parameter and unit rows to provide users more comfort when browsing the database. The built-in Excel filter function is now turned on by default (row 14), enabling users to easier generate sub-datasets of interest.
Child Care and Development Fund (CCDF) Policies Database, United States,...
childandfamilydataarchive.org
ascii, delimited +5
Updated Nov 27, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Minton, Sarah; Dwyer, Kelly; Todd, Margaret; Kwon, Danielle (2023). Child Care and Development Fund (CCDF) Policies Database, United States, 2009-2022 [Dataset]. http://doi.org/10.3886/ICPSR38908.v1
Explore at:
excel, r, stata, ascii, sas, spss, delimitedAvailable download formats
Unique identifier
https://doi.org/10.3886/ICPSR38908.v1
Dataset updated
Nov 27, 2023
Dataset provided by
Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
Authors
Minton, Sarah; Dwyer, Kelly; Todd, Margaret; Kwon, Danielle
License
https://www.icpsr.umich.edu/web/ICPSR/studies/38908/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/38908/terms
Time period covered
Jan 1, 2009 - Dec 31, 2022
Area covered
United States
Description
The Child Care and Development Fund (CCDF) provides federal money to states and territories to provide assistance to low-income families, to obtain quality child care so they can work, attend training, or receive education. Within the broad federal parameters, States and Territories set the detailed policies. Those details determine whether a particular family will or will not be eligible for subsidies, how much the family will have to pay for the care, how families apply for and retain subsidies, the maximum amounts that child care providers will be reimbursed, and the administrative procedures that providers must follow. Thus, while CCDF is a single program from the perspective of federal law, it is in practice a different program in every state and territory. The CCDF Policies Database project is a comprehensive, up-to-date database of CCDF policy information that supports the needs of a variety of audiences through (1) analytic data files, (2) a project website and search tool, and (3) an annual report (Book of Tables). These resources are made available to researchers, administrators, and policymakers with the goal of addressing important questions concerning the effects of child care subsidy policies and practices on the children and families served. A description of the data files, project website and search tool, and Book of Tables is provided below: 1. Detailed, longitudinal analytic data files provide CCDF policy information for all 50 states, the District of Columbia, and the United States territories and outlying areas that capture the policies actually in effect at a point in time, rather than proposals or legislation. They capture changes throughout each year, allowing users to access the policies in place at any point in time between October 2009 and the most recent data release. The data are organized into 32 categories with each category of variables separated into its own dataset. The categories span five general areas of policy including: Eligibility Requirements for Families and Children (Datasets 1-5) Family Application, Terms of Authorization, and Redetermination (Datasets 6-13) Family Payments (Datasets 14-18) Policies for Providers, Including Maximum Reimbursement Rates (Datasets 19-27) Overall Administrative and Quality Information Plans (Datasets 28-32) The information in the data files is based primarily on the documents that caseworkers use as they work with families and providers (often termed "caseworker manuals"). The caseworker manuals generally provide much more detailed information on eligibility, family payments, and provider-related policies than the CCDF Plans submitted by states and territories to the federal government. The caseworker manuals also provide ongoing detail for periods in between CCDF Plan dates. Each dataset contains a series of variables designed to capture the intricacies of the rules covered in the category. The variables include a mix of categorical, numeric, and text variables. Most variables have a corresponding notes field to capture additional details related to that particular variable. In addition, each category has an additional notes field to capture any information regarding the rules that is not already outlined in the category's variables. Beginning with the 2020 files, the analytic data files are supplemented by four additional data files containing select policy information featured in the annual reports (prior to 2020, the full detail of the annual reports was reproduced as data files). The supplemental data files are available as 4 datasets (Datasets 33-36) and present key aspects of the differences in CCDF-funded programs across all states and territories as of October 1 of each year (2009-2022). The files include variables that are calculated using several variables from the analytic data files (Datasets 1-32) (such as copayment amounts for example family situations) and information that is part of the annual project reports (the annual Book of Tables) but not stored in the full database (such as summary market rate survey information from the CCDF plans). 2. The project website and search tool provide access to a point-and-click user interface. Users can select from the full set of public data to create custom tables. The website also provides access to the full range of reports and products released under the CCDF Policies Data
Books Sales and Ratings
kaggle.com
zip
Updated Dec 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Books Sales and Ratings [Dataset]. https://www.kaggle.com/datasets/thedevastator/books-sales-and-ratings
Explore at:
zip(54505 bytes)Available download formats
Dataset updated
Dec 6, 2023
Authors
The Devastator
Description
Books Sales and Ratings

Books Dataset: Analyzing Sales, Ratings, and Genres

By Josh Murrey [source]

About this dataset

The Books Dataset: Sales, Ratings, and Publication provides comprehensive information on various aspects of books, including their publishing year, author details, ratings given by readers, sales performance data, and genre classification. The dataset consists of several key columns that capture important attributes related to each book.

The Publishing Year column indicates the year in which each book was published. This information helps in understanding the chronological distribution of books in the dataset.

The Book Name column contains the titles of the books. Each book has a unique name that distinguishes it from others in the dataset.

The Author column specifies the name(s) of the author(s) responsible for creating each book. This information is crucial for understanding different authors' contributions and analyzing their impact on sales and ratings.

The language_code column represents a specific code assigned to indicate the language in which each book is written. This code serves as a reference point for language-based analysis within the dataset.

Each author's rating is captured in the Author_Rating column. This rating is based on their previous works and serves as an indicator of their reputation or acclaim among readers.

The average rating given by readers for each book is recorded in the Book_average_rating column. This value reflects how well-received a particular book is by its audience.

The number of ratings given to each book by readers can be found in the Book_ratings_count column. This metric helps gauge reader engagement and provides insights into popular or widely-discussed books within this dataset.

Books are classified into different genres or categories which are mentioned under the genre column. Genre classification allows for analyzing trends across specific literary genres or identifying patterns related to certain types of books.

Sales-related data includes both gross sales revenue (gross sales) generated by each book and publisher revenue (publisher revenue) earned from these sales transactions. These numeric values provide insights into financial performance aspects associated with the book market.

The sale price column denotes the specific price at which each book is sold. This information helps evaluate pricing strategies and their potential impact on sales figures.

Sales performance is further quantified through the sales rank column, which assigns a numerical rank to each book based on its sales performance. This ranking system aids in identifying high-performing books within the dataset.

Lastly, the units sold column captures the number of units of each book that have been sold. This data highlights popular books based on reader demand and serves as a crucial measure of commercial success within the dataset.

Overall, this expansive and comprehensive Books Dataset

How to use the dataset

Introduction:

Getting Familiar with the Columns: The dataset contains multiple columns that provide different kinds of information:

Book Name: The title of each book.

Author: The name of the author who wrote the book.

language_code: The code representing the language in which the book is written.

Author_Rating: The rating assigned to the author based on their previous works.

Book_average_rating: The average rating given to the book by readers.

Book_ratings_count: The number of ratings given to the book by readers.

genre: The genre or category to which the book belongs.

gross sales: The total sales revenue generated by each book.

publisher revenue: The revenue earned by publishers from selling each book.

sale price: The price at which each copy of a book is sold.

sales rank: A numeric value indicating a book's rank based on its sales performance in comparison to other books within its category (genre).

units sold : Total number of copies sold for each specific title.

Understanding Numeric and Textual Data: Numeric columns in this dataset include Publishing Year, Author_Rating, Book_average_rating, Book_ratings_count,gross sales,publisher revenue,sale price,sales rank and units sold; these provide quantitative insights that can be used for statistical analysis and comparisons.

Additionally,the columns 'Author','Book Name',and 'genre' contain textual data that provides descriptive elements such as authors' names and categorization genres.

Exploring Relationships Between Data Points: By combining different co...
f
fdata-02-00048-g0006_Application of a Novel Subject Classification Scheme...
frontiersin.figshare.com
tiff
Updated May 31, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kei Kurakawa; Yuan Sun; Satoko Ando (2023). fdata-02-00048-g0006_Application of a Novel Subject Classification Scheme for a Bibliographic Database Using a Data-Driven Correspondence.tif [Dataset]. http://doi.org/10.3389/fdata.2019.00048.s008
Explore at:
tiffAvailable download formats
Unique identifier
https://doi.org/10.3389/fdata.2019.00048.s008
Dataset updated
May 31, 2023
Dataset provided by
Frontiers
Authors
Kei Kurakawa; Yuan Sun; Satoko Ando
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A novel subject classification scheme should often be applied to a preclassified bibliographic database for the research evaluation task. Generally, adopting a new subject classification scheme is labor intensive and time consuming, and an effective and efficient approach is necessary. Hence, we propose an approach to apply a new subject classification scheme for a subject-classified database using a data-driven correspondence between the new and present ones. In this paper, we define a subject classification model of the bibliographic database comprising a topological space. Then, we show our approach based on this model, wherein forming a compact topological space is required for a novel subject classification scheme. To form the space, a correspondence between two subject classification schemes using a research project database is utilized as data. As a case study, we applied our approach to a practical example. It is a tool used as world proprietary benchmarking for research evaluation based on a citation database. We tried to add a novel subject classification of a research project database.
t
MARHYS Database 3.0 - Vdataset - LDM
service.tib.eu
Updated Nov 30, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). MARHYS Database 3.0 - Vdataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/png-doi-10-1594-pangaea-958978
Explore at:
Dataset updated
Nov 30, 2024
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is version 3.0 of MARHYS (MARine HYdrothermal Solutions) database. The database is a global compilation of marine hydrothermal vent fluid compositions. The database includes data presented in journal publications, book chapters and monographs and includes compositions of hydrothermal fluids, end members and background seawater. Detailed sample information (metadata) is provided along with the chemical composition, which enables the unique identification of discrete vent fluid samples. The database contains concentrations of major-, minor- and trace elements, as well as, reduced carbon compounds and isotope compositions. All concentrations are converted into units per kg of vent fluid and chemical data of unique samples reported in different publications are merged into unique sample entries. In brief, the following changes have been applied in the compilation of MARHYS database version 3.0 compared to version 2.0: Data of 126 new references have been added to the database. A large number of new hydrothermal vent sites is now incorporated: Daxi Vent Field, Explorer Vent Site, Kasuga Vent Sites, Kemp Caldera, La Scala Vent Field, Maka Hydrothermal Field, Minami-Ensei, Sea Cliff Hydrothermal Field, Tarama Knoll Vent Sites (and some more). The size of the database has nearly doubled compared to Version 2.0 and now contains 6003 merged sample entries with 86738 single parameters determined (compared to 3394 merged entries in Version 2.0). The complete sample information section for a large number of entries was again overhauled and harmonized with the help of the new data. The sample information section for some vent sites has majorly improved (eg. EPR 9°N, Lucky Strike, Lost City). Some new parameters are now included in the database: ∑REE, DOC, δ41K, δ53Cr, δ56Fe, δ57Fe, δ88Sr, 222Rn, 226Ra (to mention some of them). A small number of faulty concentrations was detected and corrected. The reference list was updated.
zip file: Chapter 17: Examples of correlating, integrating and applying...
geolsoc.figshare.com
zip
Updated Sep 8, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matthew I. Wakefield; Mark W Hounslow; Rory N. Mortimore; Andrew J. Newell; Alastair Ruffell; Mark A. Woods (2022). zip file: Chapter 17: Examples of correlating, integrating and applying stratigraphy and stratigraphical methods [Dataset]. http://doi.org/10.6084/m9.figshare.21062797.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.21062797.v1
Dataset updated
Sep 8, 2022
Dataset provided by
Geological Society of Londonhttp://www.geolsoc.org.uk/
Authors
Matthew I. Wakefield; Mark W Hounslow; Rory N. Mortimore; Andrew J. Newell; Alastair Ruffell; Mark A. Woods
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A *.zip file containing four files for Worked example 17.1 from the book Deciphering Earth’s History: the Practice of Stratigraphy, which demonstrate the application of sequence slotting with palaeontological data using the CPLSlot freeware. There are two input data text files (ending in extension *.txt) of the palynological data for the two successions to be correlated, a third file (extension *.sld), which can only be read by CPLSlot, that stores information about correlation and slotting models set up when using the software, while the fourth file, an Excel file (extension *.xlsx), contains the complete workings for Worked example 17.1, and the last processing step of finalizing a correlation model not show in the worked example in the book.
t
MARHYS Database 4.0 - Vdataset - LDM
service.tib.eu
Updated Nov 30, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). MARHYS Database 4.0 - Vdataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/png-doi-10-1594-pangaea-972999
Explore at:
Dataset updated
Nov 30, 2024
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is version 4.0 of MARHYS (MARine HYdrothermal Solutions) database. The database is a global compilation of marine hydrothermal vent fluid compositions. The database includes data presented in journal publications, book chapters and monographs and includes compositions of hydrothermal fluids, end members and background seawater. Detailed sample information (metadata) is provided along with the chemical composition, which enables the unique identification of discrete vent fluid samples. The database contains concentrations of major-, minor- and trace elements, as well as, reduced carbon compounds and isotope compositions. All concentrations are converted into units per kg of vent fluid and chemical data of unique samples reported in different publications are merged into unique sample entries. In brief, the following changes have been applied in the course of the compilation of MARHYS database version 4.0 compared to version 3.0: Two new classes of geological settings have been established: "Intra-plate volcano" with Kamaʻehuakanaloa (Lōʻihi) and Teahitia Vents as representatives and "Ridge flank" with Baby Bare and ODP 1026 as representatives. Data of 66 new references have been added to the database. Some new vent sites are now incorporated: Jøtul Field, Kamaʻehuakanaloa (Lōʻihi), Nakayama Vent Field, Old City Hydrothermal Field, Saldanha Hydrothermal Field, Teahitia Vents. The size of the database has grown compared to Version 4.0 and now contains 6788 merged sample entries compared to 6003 merged entries in Version 3.0. The sample information section has been overhauled for more consistency in the location description. Some new parameters are now included in the database: Eh, TDN (Total Dissolved Nitrogen), Hg, SPE DOC and a number of isotope ratios. Again, existing data was scanned for erroneous entries and corrected wherever found. The reference list was updated and the reference list is now presented in "APA7th" style to contain hyperlinks to the references for more comfortable use.
t
MARHYS Database 1.0 - Vdataset - LDM
service.tib.eu
Updated Nov 29, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). MARHYS Database 1.0 - Vdataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/png-doi-10-1594-pangaea-921794
Explore at:
Dataset updated
Nov 29, 2024
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
MARHYS (MARine HYdrothermal Solutions) database is a global compilation of marine hydrothermal vent fluid compositions. The database includes data presented in journal publications, book chapter and monographs and includes compositions of hydrothermal fluids, end members and background seawater. Detailed sample information (metadata) is provided along with the chemical composition and enables unique identification of discrete vent fluid samples. The database contains concentrations of major-, minor- and trace elements, as well as, reduced carbon compounds and isotope compositions. All concentrations are converted into units per kg of vent fluid and chemical data of unique samples reported in different publications are merged to unique sample entries.

Facebook

Twitter

Click to copy link

Link copied

Cite

Elvin Rustamov (2023). Books Dataset [Dataset]. https://www.kaggle.com/datasets/elvinrustam/books-dataset

Books Dataset

Explore the Literary Universe: A Comprehensive Dataset of 103,063 Books

Explore at:

4 scholarly articles cite this dataset (View in Google Scholar)

zip(55469565 bytes)Available download formats

Dataset updated

Dec 20, 2023

Authors

Elvin Rustamov

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Overview: This dataset comprises information scraped from wonderbk.com, a popular online bookstore. The dataset contains details of 103,063 books, with key attributes such as title, authors, description, category, publisher, starting price, and publish date.

Columns:

Title: The title of the book.
Authors: The authors of the book.
Description: A brief description of the book.
Category: The category or genre to which the book belongs.
Publisher: The publishing house responsible for the book.
Price Starting With ($): The initial price of the book.
Publish Date (Month): The month in which the book was published.
Publish Date (Year): The year of publication.

Clear search

Close search

Google apps

Main menu

Books Dataset

Best Books Ever Dataset

Bookplate Registry Database

SQLite Sakila Sample Database

SQLite Sakila Sample Database

Database Description

Files Description

Comprehensive Drug Self-administration and Discrimination Bibliographic...

Books dataset, ISBN based

Text files from Gutenberg database

Goodreads Book Reviews

Using pseudoalignment and base quality to accurately quantify microbial...

Data from: USGS North American Packrat Midden Database, Version 5.0

Percentage errors for popularity change predictions in Twitter database.

Alcohol and Alcohol Problems Science Database

MARHYS Database 2.0 - Vdataset - LDM

Child Care and Development Fund (CCDF) Policies Database, United States,...

Books Sales and Ratings

Books Sales and Ratings

Books Dataset: Analyzing Sales, Ratings, and Genres

About this dataset

How to use the dataset

fdata-02-00048-g0006_Application of a Novel Subject Classification Scheme...

MARHYS Database 3.0 - Vdataset - LDM

zip file: Chapter 17: Examples of correlating, integrating and applying...

MARHYS Database 4.0 - Vdataset - LDM

MARHYS Database 1.0 - Vdataset - LDM

Books Dataset

Explore the Literary Universe: A Comprehensive Dataset of 103,063 Books