Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
License information was derived automatically
I uploaded GroupLens' Book Genome dataset on Kaggle. It doesn't seem like they're active here any more and I want to use this here for some exploratory learning work I did.
Official link here: https://grouplens.org/datasets/book-genome/
Tag Genome is a data structure containing scores indicating the degree to which tags apply to items, such as movies or books. This dataset contains a Tag Genome generated for a set of books along with the data used for its generation (raw data). Raw data consists of a subset of the Goodreads dataset [Wan and McAuley, 2018, Wan et al., 2019] and book-tag ratings. The Goodreads subset includes information on popular books, such as titles, authors, release years, user ratings, reviews and shelves. Shelves are lists that users use to organize books in Goodreads (https://www.goodreads.com/). In these instructions, we refer to adding books to shelves as attaching tags (shelf names) to books. To collect book-tag ratings, we conducted a survey on Amazon Mechanical Turk, where we asked users to indicate degree to which tags apply to books from this subset. To generate book-tag scores, we used two state-of-the-art algorithms: Glmer [Vig et al., 2012] and TagDL [Kotkov et al., 2021]. The code is available in the following GitHub repository: https://github.com/Bionic1251/Revisiting-the-Tag-Relevance-Prediction-Problem
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Contains the tag information for the 1 million+ images uploaded to the British Library Flickr Commons account. taghistory.zip - contains a single .tsv file that lists the tags that were added (or removed) from images on the Flickr Common account for the first year. NB There was an issue getting information from Flickr for the first few months, so early information is not available.
book_data.zip - contains a .json file, that holds a list of records, one for each digitised work. The record holds information on the work's title, authors and so on, as well as information on what images on Flickr correspond to it, as well as the identifier required to download PDF version(s) of the entire work.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This dataset contains details of 2,495,958 unique public tags added to 10,403,650 resources in Trove between August 2008 and June 2024. I harvested the data using the Trove API and saved it as a CSV file with the following columns:
I've documented the method used to harvest the tags in this notebook.
Using the `zone` and `record_id` you can find more information about a tagged item. To create urls to the resources in Trove:
Notes:
See this notebook for some examples of how you can manipulate, analyse, and visualise the tag data.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about book subjects. It has 13 rows and is filtered where the books is Tag. It features 4 columns: authors, books, and publication dates.
https://www.mordorintelligence.com/privacy-policyhttps://www.mordorintelligence.com/privacy-policy
The Booklet Label Market report segments the industry into By Product Type (Multi-Panel Labels, Peel-and-Reveal Labels), By Material (Paper, Film), By Printing Technology (Flexographic, Digital, Offset, Other Printing Technology), By End-Use Industry (Pharmaceuticals, Food and Beverage, Chemicals, Cosmetics and Personal Care, Other End-use Industries), and By Geography (North America, Europe, Asia, and more).
Application of ear tags in cattle is a common husbandry practice for identification purposes. While it is known that ear tag application causes damage, little is known about the duration and process of wound healing associated with this procedure. Our objective was to quantify the wound healing progression in dairy calves with plastic identification tags. Calves (n=33) were ear tagged at 2 d of age and wound photos were taken weekly until 9–22 wk of age. This approach generated 10–22 observations per calf that were analyzed using a novel wound-scoring system. We developed this system to score the presence or absence of 8 different tissue types related to piercing trauma or mechanical irritation along the top of the tag (impressions, crust, and desquamation) and around the piercing (exudate, crust, tissue growth, and desquamation). Ears were scored as undamaged when tissue was intact. We found that wound tissue types associated with damage were still seen in many calves for at least 12 w...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about book series. It has 1 row and is filtered where the books is Cards & tags. It features 4 columns: authors, books, and publication dates.
https://fred.stlouisfed.org/legal/#copyright-public-domainhttps://fred.stlouisfed.org/legal/#copyright-public-domain
Graph and download economic data for Producer Price Index by Industry: Commercial Printing, Except Screen and Books: Label and Wrapper Printing (Lithographic) (PCU32311K32311K03) from Jun 1982 to Jun 2025 about book, printing, commercial, PPI, industry, inflation, price index, indexes, price, and USA.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about books. It has 3 rows and is filtered where the book subjects is Warning labels-Humor. It features 9 columns including author, publication date, language, and book publisher.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about books. It has 2 rows and is filtered where the book is Cards, wrap and tags. It features 5 columns: author, publication date, book publisher, and BNB id.
This explanation pertains to the data prepared for Non sola scriptura: Essays on the Qur’an and Islam in Honour of William A. Graham (Routledge), Chapter by Sarah Bowen Savant, “People versus Books.” We are releasing data that was used to create for the chapter, Graphs 1 and 2 and also Tables 1-3. Note: All the data files (except the text in number 3) are in TSV format (Tab Separated Values) and any text editor or tabular data editor, such as Excel can deal with it. “IsnadFractions_PeopleversusBooks”. This file represents a filtered version of an output from Ryan Muther’s isnād classifier algorithm. Muther ran the algorithm in July 2020, based on the Version 2020.1.2 release of the corpus, available at: http://doi.org/10.5281/zenodo.3891466. The data file includes: author: the name of the author. died: death date of author. NB: Especially the early dates cannot be relied on. title: the title of the author’s book, from the OpenITI Corpus. length: length of the book, measured in word-tokens. isnad_fraction: the percentage of the book’s word-tokens that are made up of isnāds. “GALTags_PeopleversusBooks”. Books in the OpenITI were mapped by Walid A. Akef in 2018 to: Brockelmann, Carl, History of the Arabic Written Traditions, trans. Joep Lameer, 2 vols and 3 supplements, Leiden: Brill, 2016-2018. The file includes the following columns: id: book id, from the OpenITI Corpus. gal_tags: the GAL tags, also used in the OpenITI Corpus “0571IbnCasakir.TarikhDimashq.JK000916-ara1.mARkdown”. The Ibn ʿAsākir text file, from the Version 2020.1.2 release of the OpenITI Corpus. “NamedEntities_PeopleversusBooks”. This is a very first effort at working on named entities in Ibn ʿAsākir’s Taʾrīkh Madīnat Dimashq and represents only a tiny fraction of the surface forms of names. Most of the names pertain to persons who transmitted from Ibn Saʿd. There may be some duplicate surface forms (which does not affect the method). We use this list to replace the surface forms with transliterated values. The column description is as below: name: the normalized name. ar_name: the Arabic name, which are the surface forms. status: true (T)/false (F) values to include/exclude the cases in the replacement process. We have used true values. “SplittingTerms_PeopleversusBooks”. We started with a list of transmissive terms that R. Kevin Jaques originated and then added more terms, which include the various normalized forms of the same term. We used this list to split isnāds into names. “IbnSadIsnads_PeopleversusBooks”. This file includes the pieces of texts that the algorithm tags as isnāds in the text. We extracted the tagged pieces and made a list of isnāds. Almost all of the isnāds start with a transmissive term. We use this file to extract the names and clean some rows to generate a data table that we can use for clustering. Below are the brief description of the column: text_ID: this contains the book id from the OpenITI Corpus. This column can be ignored as we are using it for one text in this project. However, it is required in the collection of isnāds from multiple texts. id: a unique identifier assigned to each isnād. The isnād classifier algorithm assigns this id and can be used to identify each isnād in the text when required. isnad_text: the isnād that we extract from the text. length: length of the extracted isnād in tokens “IsnadNames_PeopleversusBooks”. This file is the isnāds list (number 5 on this list) splitted by the transmissive terms (number 4 on this list) in order to extract the names in the isnāds. The column are the same as below: text_ID: this contains the book id from the OpenITI corpus. This column can be ignored as we are using it for one text in this project. However, it is required in the collection of isnāds from multiple texts. isnad_text: this column is the isnād that we extract from the text. ibnSad_cnt: number of times that the name Ibn Saʿd is mentioned in the corresponding isnād. name_at_position_X: the rest of the columns in this table include the pieces of the isnād that we get after splitting the isnāds with a list of terms. Each column contains a name or any string that appears between two transmissive terms. Some cells are empty and it is because we probably miss some transmissive terms. “IbnSadClusters_PeopleversusBooks”. This file includes clusters of isnāds of length six (i.e. isnāds that include six names). We have used the affinity propagation (AP) clustering algorithm based on the Levenstein similarity score of the names. Below is the column description: frequency: the frequency of the isnād in the data cluster_id: the id of the cluster to which the isnād belongs nameX: columns C to H include the names in isnād at position 1 to 6, running back to Muhammad b. Saʿd at position 6. “JK000916-ara1.mARkdown_Shamela0001686-ara1.completed”. This is the passim output from the February 2020 run (which used the same version of the corpus; Version 2020.1.2). For definition of fields in this file, please see...
https://fred.stlouisfed.org/legal/#copyright-public-domainhttps://fred.stlouisfed.org/legal/#copyright-public-domain
Graph and download economic data for Producer Price Index by Industry: Commercial Printing, Except Screen and Books: Label and Wrapper Printing (Flexographic) (PCU32311K32311K21) from Dec 2001 to Jun 2025 about book, printing, commercial, PPI, industry, inflation, price index, indexes, price, and USA.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about books. It has 4 rows and is filtered where the book subjects is Matchbox labels, British-Collectors and collecting. It features 9 columns including author, publication date, language, and book publisher.
Comprehensive market data and analytics for RULE BOOK 4.0 including pricing distribution, seller metrics, and market trends.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about books. It has 1 row and is filtered where the book is The Parlophone red label popular series, E5000 - E6428. It features 7 columns including author, publication date, language, and book publisher.
Herein we present data collected during experiments on massive deposition in upper regime. (Refer to http://sedexp.net/experiment/experiments-massive-deposits-upper-regime for more information on the experimental setup).
The data are separated as follows: 00-profiles: Water surface and bed elevation profiles. 01-sonar data: Instantaneous realizations of bed elevation fluctuations captured by JSR ultrasonic probes. 02-media: collection of pictures, time-lapses and movies corresponding to the experiments.
Data are divided by flow rate (i.e., 20 l/s, 30 l/s), by feed rate (e.g., 1.5 kg/min, 8, kg/min, 16 kg/min) and by experiment type (i.e. equilibrium or aggradational runs), wherever appropriate.
The design and synthesis of a novel proline-reporter-based isobaric Tandem Mass Tag 16 tag set (TMTpro) was carried out and the data uploaded here is a comparison of the performance of the new TMTpro tags with the current commercially available dimethylpiperidine-reporter-based TMT10/11 reagents. Data from 2 experiments are provided.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Fish oil status, number of G alleles at rs112803755, mean triglycerides, sample size, standard deviation of triglycerides, and 95% confidence interval for combined participants from Stage 1 and Stage 2. (XLSX)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Math equations for tag generation.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
License information was derived automatically
I uploaded GroupLens' Book Genome dataset on Kaggle. It doesn't seem like they're active here any more and I want to use this here for some exploratory learning work I did.
Official link here: https://grouplens.org/datasets/book-genome/
Tag Genome is a data structure containing scores indicating the degree to which tags apply to items, such as movies or books. This dataset contains a Tag Genome generated for a set of books along with the data used for its generation (raw data). Raw data consists of a subset of the Goodreads dataset [Wan and McAuley, 2018, Wan et al., 2019] and book-tag ratings. The Goodreads subset includes information on popular books, such as titles, authors, release years, user ratings, reviews and shelves. Shelves are lists that users use to organize books in Goodreads (https://www.goodreads.com/). In these instructions, we refer to adding books to shelves as attaching tags (shelf names) to books. To collect book-tag ratings, we conducted a survey on Amazon Mechanical Turk, where we asked users to indicate degree to which tags apply to books from this subset. To generate book-tag scores, we used two state-of-the-art algorithms: Glmer [Vig et al., 2012] and TagDL [Kotkov et al., 2021]. The code is available in the following GitHub repository: https://github.com/Bionic1251/Revisiting-the-Tag-Relevance-Prediction-Problem