19 datasets found

Book Genome Dataset
kaggle.com
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel Young (2023). Book Genome Dataset [Dataset]. https://www.kaggle.com/datasets/youngdaniel/book-genome-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 30, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Daniel Young
License
Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
License information was derived automatically
Description
I uploaded GroupLens' Book Genome dataset on Kaggle. It doesn't seem like they're active here any more and I want to use this here for some exploratory learning work I did.

Official link here: https://grouplens.org/datasets/book-genome/

Tag Genome is a data structure containing scores indicating the degree to which tags apply to items, such as movies or books. This dataset contains a Tag Genome generated for a set of books along with the data used for its generation (raw data). Raw data consists of a subset of the Goodreads dataset [Wan and McAuley, 2018, Wan et al., 2019] and book-tag ratings. The Goodreads subset includes information on popular books, such as titles, authors, release years, user ratings, reviews and shelves. Shelves are lists that users use to organize books in Goodreads (https://www.goodreads.com/). In these instructions, we refer to adding books to shelves as attaching tags (shelf names) to books. To collect book-tag ratings, we conducted a survey on Amazon Mechanical Turk, where we asked users to indicate degree to which tags apply to books from this subset. To generate book-tag scores, we used two state-of-the-art algorithms: Glmer [Vig et al., 2012] and TagDL [Kotkov et al., 2021]. The code is available in the following GitHub repository: https://github.com/Bionic1251/Revisiting-the-Tag-Relevance-Prediction-Problem
BL Labs Flickr Data: Book data and tag history (Dec 2013 - Dec 2014)
figshare.com
zip
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ben O'steen; James Baker (2023). BL Labs Flickr Data: Book data and tag history (Dec 2013 - Dec 2014) [Dataset]. http://doi.org/10.6084/m9.figshare.1269249.v2
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.1269249.v2
Dataset updated
May 30, 2023
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Ben O'steen; James Baker
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Contains the tag information for the 1 million+ images uploaded to the British Library Flickr Commons account. taghistory.zip - contains a single .tsv file that lists the tags that were added (or removed) from images on the Flickr Common account for the first year. NB There was an issue getting information from Flickr for the first few months, so early information is not available.

book_data.zip - contains a .json file, that holds a list of records, one for each digitised work. The record holds information on the work's title, authors and so on, as well as information on what images on Flickr correspond to it, as well as the identifier required to download PDF version(s) of the entire work.
Public tags added to resources in Trove, 2008 to 2024
zenodo.org
data.niaid.nih.gov
zip
Updated Jun 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tim Sherratt; Tim Sherratt (2024). Public tags added to resources in Trove, 2008 to 2024 [Dataset]. http://doi.org/10.5281/zenodo.11496377
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.11496377
Dataset updated
Jun 6, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Tim Sherratt; Tim Sherratt
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
This dataset contains details of 2,495,958 unique public tags added to 10,403,650 resources in Trove between August 2008 and June 2024. I harvested the data using the Trove API and saved it as a CSV file with the following columns:

`tag` – lower-cased text tag

`date` – date the tag was added

`zone` – API zone containing the tagged resource

`record_id` – the identifier of the tagged resource

I've documented the method used to harvest the tags in this notebook.

Using the `zone` and `record_id` you can find more information about a tagged item. To create urls to the resources in Trove:

for resources in the 'book', 'article', 'picture', 'music', 'map', and 'collection' zones add the `record_id` to `https://trove.nla.gov.au/work/`

for resources in the 'newspaper' and 'gazette' zones add the `record_id` to `https://trove.nla.gov.au/article/`

for resources in the 'list' zone add the `record_id` to `https://trove.nla.gov.au/list/`

Notes:

Works (such as books) in Trove can have tags attached at either work or version level. This dataset aggregates all tags at the work level, removing any duplicates.

A single resource in Trove can appear in multiple zones – for example, a book that includes maps and illustrations might appear in the 'book', 'picture', and 'map' zones. This means that some of the tags will essentially be duplicates – harvested from different zones, but relating to the same resource. Depending on your needs, you might want to remove these duplicates.

While most of the tags were added by Trove users, more than 500,000 tags were added by Trove itself in November 2009. I think these tags were automatically generated from related Wikipedia pages. Depending on your needs, you might want to exclude these by limiting the date range or zones.

User content added to Trove, including tags, is available for reuse under a CC-BY-NC licence.

See this notebook for some examples of how you can manipulate, analyse, and visualise the tag data.
w
Dataset of authors, books and publication dates of book subjects where books...
workwithdata.com
Updated Nov 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2024). Dataset of authors, books and publication dates of book subjects where books equals Tag [Dataset]. https://www.workwithdata.com/datasets/book-subjects?col=book_subject%2Cj0-author%2Cj0-book%2Cj0-publication_date&f=1&fcol0=j0-book&fop0=%3D&fval0=Tag&j=1&j0=books
Explore at:
Dataset updated
Nov 7, 2024
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about book subjects. It has 13 rows and is filtered where the books is Tag. It features 4 columns: authors, books, and publication dates.
Booklet Label Market Size & Share Analysis - Industry Research Report -...
mordorintelligence.com
pdf,excel,csv,ppt
Updated Mar 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mordor Intelligence (2025). Booklet Label Market Size & Share Analysis - Industry Research Report - Growth Trends [Dataset]. https://www.mordorintelligence.com/industry-reports/booklet-label-market
Explore at:
pdf,excel,csv,pptAvailable download formats
Dataset updated
Mar 20, 2025
Dataset authored and provided by
Mordor Intelligence
License
https://www.mordorintelligence.com/privacy-policyhttps://www.mordorintelligence.com/privacy-policy
Time period covered
2019 - 2030
Area covered
Global
Description
The Booklet Label Market report segments the industry into By Product Type (Multi-Panel Labels, Peel-and-Reveal Labels), By Material (Paper, Film), By Printing Technology (Flexographic, Digital, Offset, Other Printing Technology), By End-Use Industry (Pharmaceuticals, Food and Beverage, Chemicals, Cosmetics and Personal Care, Other End-use Industries), and By Geography (North America, Europe, Asia, and more).
d
Data from: Development and application of a novel approach to scoring ear...
datadryad.org
data.niaid.nih.gov
+1more
zip
Updated Apr 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Megan Lynn Harmon; Blair Caitlin Downey; Alycia Marie Drwencke; Cassandra Blaine Tucker (2023). Development and application of a novel approach to scoring ear tag wounds in dairy calves [Dataset]. http://doi.org/10.25338/B8BS8J
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.25338/B8BS8J
Dataset updated
Apr 7, 2023
Dataset provided by
Dryad
Authors
Megan Lynn Harmon; Blair Caitlin Downey; Alycia Marie Drwencke; Cassandra Blaine Tucker
Time period covered
Apr 4, 2023
Description
Application of ear tags in cattle is a common husbandry practice for identification purposes. While it is known that ear tag application causes damage, little is known about the duration and process of wound healing associated with this procedure. Our objective was to quantify the wound healing progression in dairy calves with plastic identification tags. Calves (n=33) were ear tagged at 2 d of age and wound photos were taken weekly until 9–22 wk of age. This approach generated 10–22 observations per calf that were analyzed using a novel wound-scoring system. We developed this system to score the presence or absence of 8 different tissue types related to piercing trauma or mechanical irritation along the top of the tag (impressions, crust, and desquamation) and around the piercing (exudate, crust, tissue growth, and desquamation). Ears were scored as undamaged when tissue was intact. We found that wound tissue types associated with damage were still seen in many calves for at least 12 w...
w
Dataset of authors, books and publication dates of book series where books...
workwithdata.com
Updated Nov 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2024). Dataset of authors, books and publication dates of book series where books equals Cards & tags [Dataset]. https://www.workwithdata.com/datasets/book-series?col=book_series%2Cj0-author%2Cj0-book%2Cj0-publication_date&f=1&fcol0=j0-book&fop0=%3D&fval0=Cards+%26+tags&j=1&j0=books
Explore at:
Dataset updated
Nov 25, 2024
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about book series. It has 1 row and is filtered where the books is Cards & tags. It features 4 columns: authors, books, and publication dates.
F
Producer Price Index by Industry: Commercial Printing, Except Screen and...
fred.stlouisfed.org
json
Updated Jul 16, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Producer Price Index by Industry: Commercial Printing, Except Screen and Books: Label and Wrapper Printing (Lithographic) [Dataset]. https://fred.stlouisfed.org/series/PCU32311K32311K03
Explore at:
jsonAvailable download formats
Dataset updated
Jul 16, 2025
License
https://fred.stlouisfed.org/legal/#copyright-public-domainhttps://fred.stlouisfed.org/legal/#copyright-public-domain
Description
Graph and download economic data for Producer Price Index by Industry: Commercial Printing, Except Screen and Books: Label and Wrapper Printing (Lithographic) (PCU32311K32311K03) from Jun 1982 to Jun 2025 about book, printing, commercial, PPI, industry, inflation, price index, indexes, price, and USA.
w
Dataset of books about Warning labels-Humor
workwithdata.com
Updated Apr 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2025). Dataset of books about Warning labels-Humor [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=j0-book_subject&fop0=%3D&fval0=Warning+labels-Humor&j=1&j0=book_subjects
Explore at:
Dataset updated
Apr 17, 2025
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about books. It has 3 rows and is filtered where the book subjects is Warning labels-Humor. It features 9 columns including author, publication date, language, and book publisher.
w
Dataset of author, BNB id, book publisher, and publication date of Cards,...
workwithdata.com
Updated Apr 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2025). Dataset of author, BNB id, book publisher, and publication date of Cards, wrap and tags [Dataset]. https://www.workwithdata.com/datasets/books?col=author%2Cbnb_id%2Cbook%2Cbook%2Cbook_publisher%2Cpublication_date&f=1&fcol0=book&fop0=%3D&fval0=Cards%2C+wrap+and+tags
Explore at:
Dataset updated
Apr 17, 2025
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about books. It has 2 rows and is filtered where the book is Cards, wrap and tags. It features 5 columns: author, publication date, book publisher, and BNB id.
o
Data from: People versus Books
explore.openaire.eu
zenodo.org
Updated Jul 6, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sarah Bowen Savant; Masoumeh Seydi (2021). People versus Books [Dataset]. http://doi.org/10.5281/zenodo.5074632
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.5074632
Dataset updated
Jul 6, 2021
Authors
Sarah Bowen Savant; Masoumeh Seydi
Description
This explanation pertains to the data prepared for Non sola scriptura: Essays on the Qur’an and Islam in Honour of William A. Graham (Routledge), Chapter by Sarah Bowen Savant, “People versus Books.” We are releasing data that was used to create for the chapter, Graphs 1 and 2 and also Tables 1-3. Note: All the data files (except the text in number 3) are in TSV format (Tab Separated Values) and any text editor or tabular data editor, such as Excel can deal with it. “IsnadFractions_PeopleversusBooks”. This file represents a filtered version of an output from Ryan Muther’s isnād classifier algorithm. Muther ran the algorithm in July 2020, based on the Version 2020.1.2 release of the corpus, available at: http://doi.org/10.5281/zenodo.3891466. The data file includes: author: the name of the author. died: death date of author. NB: Especially the early dates cannot be relied on. title: the title of the author’s book, from the OpenITI Corpus. length: length of the book, measured in word-tokens. isnad_fraction: the percentage of the book’s word-tokens that are made up of isnāds. “GALTags_PeopleversusBooks”. Books in the OpenITI were mapped by Walid A. Akef in 2018 to: Brockelmann, Carl, History of the Arabic Written Traditions, trans. Joep Lameer, 2 vols and 3 supplements, Leiden: Brill, 2016-2018. The file includes the following columns: id: book id, from the OpenITI Corpus. gal_tags: the GAL tags, also used in the OpenITI Corpus “0571IbnCasakir.TarikhDimashq.JK000916-ara1.mARkdown”. The Ibn ʿAsākir text file, from the Version 2020.1.2 release of the OpenITI Corpus. “NamedEntities_PeopleversusBooks”. This is a very first effort at working on named entities in Ibn ʿAsākir’s Taʾrīkh Madīnat Dimashq and represents only a tiny fraction of the surface forms of names. Most of the names pertain to persons who transmitted from Ibn Saʿd. There may be some duplicate surface forms (which does not affect the method). We use this list to replace the surface forms with transliterated values. The column description is as below: name: the normalized name. ar_name: the Arabic name, which are the surface forms. status: true (T)/false (F) values to include/exclude the cases in the replacement process. We have used true values. “SplittingTerms_PeopleversusBooks”. We started with a list of transmissive terms that R. Kevin Jaques originated and then added more terms, which include the various normalized forms of the same term. We used this list to split isnāds into names. “IbnSadIsnads_PeopleversusBooks”. This file includes the pieces of texts that the algorithm tags as isnāds in the text. We extracted the tagged pieces and made a list of isnāds. Almost all of the isnāds start with a transmissive term. We use this file to extract the names and clean some rows to generate a data table that we can use for clustering. Below are the brief description of the column: text_ID: this contains the book id from the OpenITI Corpus. This column can be ignored as we are using it for one text in this project. However, it is required in the collection of isnāds from multiple texts. id: a unique identifier assigned to each isnād. The isnād classifier algorithm assigns this id and can be used to identify each isnād in the text when required. isnad_text: the isnād that we extract from the text. length: length of the extracted isnād in tokens “IsnadNames_PeopleversusBooks”. This file is the isnāds list (number 5 on this list) splitted by the transmissive terms (number 4 on this list) in order to extract the names in the isnāds. ‌The column are the same as below: text_ID: this contains the book id from the OpenITI corpus. This column can be ignored as we are using it for one text in this project. However, it is required in the collection of isnāds from multiple texts. isnad_text: this column is the isnād that we extract from the text. ibnSad_cnt: number of times that the name Ibn Saʿd is mentioned in the corresponding isnād. name_at_position_X: the rest of the columns in this table include the pieces of the isnād that we get after splitting the isnāds with a list of terms. Each column contains a name or any string that appears between two transmissive terms. Some cells are empty and it is because we probably miss some transmissive terms. “IbnSadClusters_PeopleversusBooks”. This file includes clusters of isnāds of length six (i.e. isnāds that include six names). We have used the affinity propagation (AP) clustering algorithm based on the Levenstein similarity score of the names. Below is the column description: frequency: the frequency of the isnād in the data cluster_id: the id of the cluster to which the isnād belongs nameX: columns C to H include the names in isnād at position 1 to 6, running back to Muhammad b. Saʿd at position 6. “JK000916-ara1.mARkdown_Shamela0001686-ara1.completed”. This is the passim output from the February 2020 run (which used the same version of the corpus; Version 2020.1.2). For definition of fields in this file, please see...
F
Producer Price Index by Industry: Commercial Printing, Except Screen and...
fred.stlouisfed.org
json
Updated Jul 16, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Producer Price Index by Industry: Commercial Printing, Except Screen and Books: Label and Wrapper Printing (Flexographic) [Dataset]. https://fred.stlouisfed.org/series/PCU32311K32311K21
Explore at:
jsonAvailable download formats
Dataset updated
Jul 16, 2025
License
https://fred.stlouisfed.org/legal/#copyright-public-domainhttps://fred.stlouisfed.org/legal/#copyright-public-domain
Description
Graph and download economic data for Producer Price Index by Industry: Commercial Printing, Except Screen and Books: Label and Wrapper Printing (Flexographic) (PCU32311K32311K21) from Dec 2001 to Jun 2025 about book, printing, commercial, PPI, industry, inflation, price index, indexes, price, and USA.
w
Dataset of books about Matchbox labels, British-Collectors and collecting
workwithdata.com
Updated Apr 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2025). Dataset of books about Matchbox labels, British-Collectors and collecting [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=j0-book_subject&fop0=%3D&fval0=Matchbox+labels%2C+British-Collectors+and+collecting&j=1&j0=book_subjects
Explore at:
Dataset updated
Apr 17, 2025
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about books. It has 4 rows and is filtered where the book subjects is Matchbox labels, British-Collectors and collecting. It features 9 columns including author, publication date, language, and book publisher.
n
Market Analysis for RULE BOOK 4.0
nsc.onl
Updated Aug 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Market Analysis for RULE BOOK 4.0 [Dataset]. https://nsc.onl/cards/tag/665619/rule-book-4-0
Explore at:
Dataset updated
Aug 8, 2025
Variables measured
Countries, Price Range, Median Price, Average Price, Sold Listings, Total Listings, Active Listings, Unsold Listings, Number of Sellers, Sell-Through Rate
Description
Comprehensive market data and analytics for RULE BOOK 4.0 including pricing distribution, seller metrics, and market trends.
w
Dataset of books called The Parlophone red label popular series, E5000 -...
workwithdata.com
Updated Apr 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2025). Dataset of books called The Parlophone red label popular series, E5000 - E6428 [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=The+Parlophone+red+label+popular+series%2C+E5000+-+E6428
Explore at:
Dataset updated
Apr 17, 2025
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about books. It has 1 row and is filtered where the book is The Parlophone red label popular series, E5000 - E6428. It features 7 columns including author, publication date, language, and book publisher.
d
Massive units emplaced by bedload transport in sheet flow mode
search.dataone.org
Updated Feb 15, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ricardo Hernandez Moreira (2017). Massive units emplaced by bedload transport in sheet flow mode [Dataset]. https://search.dataone.org/view/seadva-RicardoHernandezMoreira-f77ec8c0-2a42-4c1a-bfe6-992db22b6239
Explore at:
Dataset updated
Feb 15, 2017
Dataset provided by
SEAD Virtual Archive
Authors
Ricardo Hernandez Moreira
Time period covered
Feb 15, 2017
Area covered
Earth
Description
Herein we present data collected during experiments on massive deposition in upper regime. (Refer to http://sedexp.net/experiment/experiments-massive-deposits-upper-regime for more information on the experimental setup).

The data are separated as follows: 00-profiles: Water surface and bed elevation profiles. 01-sonar data: Instantaneous realizations of bed elevation fluctuations captured by JSR ultrasonic probes. 02-media: collection of pictures, time-lapses and movies corresponding to the experiments.

Data are divided by flow rate (i.e., 20 l/s, 30 l/s), by feed rate (e.g., 1.5 kg/min, 8, kg/min, 16 kg/min) and by experiment type (i.e. equilibrium or aggradational runs), wherever appropriate.
e
TMTpro: Design and initial evaluation of a novel Proline-based isobaric...
ebi.ac.uk
Updated Nov 25, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michael Bremang (2019). TMTpro: Design and initial evaluation of a novel Proline-based isobaric 16-plex Tandem Mass Tag reagent set [Dataset]. https://www.ebi.ac.uk/pride/archive/projects/PXD014750
Explore at:
Dataset updated
Nov 25, 2019
Authors
Michael Bremang
Variables measured
Proteomics
Description
The design and synthesis of a novel proline-reporter-based isobaric Tandem Mass Tag 16 tag set (TMTpro) was carried out and the data uploaded here is a comparison of the performance of the new TMTpro tags with the current commercially available dimethylpiperidine-reporter-based TMT10/11 reagents. Data from 2 experiments are provided.
f
Data used in Fig 3B.
plos.figshare.com
datasetcatalog.nlm.nih.gov
xlsx
Updated Jun 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michael Francis; Changwei Li; Yitang Sun; Jingqi Zhou; Xiang Li; J. Thomas Brenna; Kaixiong Ye (2023). Data used in Fig 3B. [Dataset]. http://doi.org/10.1371/journal.pgen.1009431.s011
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pgen.1009431.s011
Dataset updated
Jun 8, 2023
Dataset provided by
PLOS Genetics
Authors
Michael Francis; Changwei Li; Yitang Sun; Jingqi Zhou; Xiang Li; J. Thomas Brenna; Kaixiong Ye
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Fish oil status, number of G alleles at rs112803755, mean triglycerides, sample size, standard deviation of triglycerides, and 95% confidence interval for combined participants from Stage 1 and Stage 2. (XLSX)
Math equations for tag generation.
plos.figshare.com
figshare.com
xls
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Reem Almarwani; Ning Zhang; James Garside (2023). Math equations for tag generation. [Dataset]. http://doi.org/10.1371/journal.pone.0244731.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0244731.t002
Dataset updated
Jun 4, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Reem Almarwani; Ning Zhang; James Garside
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Math equations for tag generation.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Daniel Young (2023). Book Genome Dataset [Dataset]. https://www.kaggle.com/datasets/youngdaniel/book-genome-dataset

Book Genome Dataset

GroupLens's Tag Genome dataset for Books

Explore at:

315 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

May 30, 2023

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Daniel Young

License

Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
License information was derived automatically

Description

I uploaded GroupLens' Book Genome dataset on Kaggle. It doesn't seem like they're active here any more and I want to use this here for some exploratory learning work I did.

Official link here: https://grouplens.org/datasets/book-genome/

Tag Genome is a data structure containing scores indicating the degree to which tags apply to items, such as movies or books. This dataset contains a Tag Genome generated for a set of books along with the data used for its generation (raw data). Raw data consists of a subset of the Goodreads dataset [Wan and McAuley, 2018, Wan et al., 2019] and book-tag ratings. The Goodreads subset includes information on popular books, such as titles, authors, release years, user ratings, reviews and shelves. Shelves are lists that users use to organize books in Goodreads (https://www.goodreads.com/). In these instructions, we refer to adding books to shelves as attaching tags (shelf names) to books. To collect book-tag ratings, we conducted a survey on Amazon Mechanical Turk, where we asked users to indicate degree to which tags apply to books from this subset. To generate book-tag scores, we used two state-of-the-art algorithms: Glmer [Vig et al., 2012] and TagDL [Kotkov et al., 2021]. The code is available in the following GitHub repository: https://github.com/Bionic1251/Revisiting-the-Tag-Relevance-Prediction-Problem

Clear search

Close search

Google apps

Main menu

Book Genome Dataset

BL Labs Flickr Data: Book data and tag history (Dec 2013 - Dec 2014)

Public tags added to resources in Trove, 2008 to 2024

Dataset of authors, books and publication dates of book subjects where books...

Booklet Label Market Size & Share Analysis - Industry Research Report -...

Data from: Development and application of a novel approach to scoring ear...

Dataset of authors, books and publication dates of book series where books...

Producer Price Index by Industry: Commercial Printing, Except Screen and...

Dataset of books about Warning labels-Humor

Dataset of author, BNB id, book publisher, and publication date of Cards,...

Data from: People versus Books

Producer Price Index by Industry: Commercial Printing, Except Screen and...

Dataset of books about Matchbox labels, British-Collectors and collecting

Market Analysis for RULE BOOK 4.0

Dataset of books called The Parlophone red label popular series, E5000 -...

Massive units emplaced by bedload transport in sheet flow mode

TMTpro: Design and initial evaluation of a novel Proline-based isobaric...

Data used in Fig 3B.

Math equations for tag generation.

Book Genome Dataset

GroupLens's Tag Genome dataset for Books