6 datasets found
  1. Best Books Ever Dataset

    • zenodo.org
    csv
    Updated Nov 10, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lorena Casanova Lozano; Sergio Costa Planells; Lorena Casanova Lozano; Sergio Costa Planells (2020). Best Books Ever Dataset [Dataset]. http://doi.org/10.5281/zenodo.4265096
    Explore at:
    csvAvailable download formats
    Dataset updated
    Nov 10, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Lorena Casanova Lozano; Sergio Costa Planells; Lorena Casanova Lozano; Sergio Costa Planells
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    The dataset has been collected in the frame of the Prac1 of the subject Tipology and Data Life Cycle of the Master's Degree in Data Science of the Universitat Oberta de Catalunya (UOC).

    The dataset contains 25 variables and 52478 records corresponding to books on the GoodReads Best Books Ever list (the larges list on the site).

    Original code used to retrieve the dataset can be found on github repository: github.com/scostap/goodreads_bbe_dataset

    The data was retrieved in two sets, the first 30000 books and then the remainig 22478. Dates were not parsed and reformated on the second chunk so publishDate and firstPublishDate are representet in a mm/dd/yyyy format for the first 30000 records and Month Day Year for the rest.

    Book cover images can be optionally downloaded from the url in the 'coverImg' field. Python code for doing so and an example can be found on the github repo.

    The 25 fields of the dataset are:

    | Attributes | Definition | Completeness |
    | ------------- | ------------- | ------------- | 
    | bookId | Book Identifier as in goodreads.com | 100 |
    | title | Book title | 100 |
    | series | Series Name | 45 |
    | author | Book's Author | 100 |
    | rating | Global goodreads rating | 100 |
    | description | Book's description | 97 |
    | language | Book's language | 93 |
    | isbn | Book's ISBN | 92 |
    | genres | Book's genres | 91 |
    | characters | Main characters | 26 |
    | bookFormat | Type of binding | 97 |
    | edition | Type of edition (ex. Anniversary Edition) | 9 |
    | pages | Number of pages | 96 |
    | publisher | Editorial | 93 |
    | publishDate | publication date | 98 |
    | firstPublishDate | Publication date of first edition | 59 |
    | awards | List of awards | 20 |
    | numRatings | Number of total ratings | 100 |
    | ratingsByStars | Number of ratings by stars | 97 |
    | likedPercent | Derived field, percent of ratings over 2 starts (as in GoodReads) | 99 |
    | setting | Story setting | 22 |
    | coverImg | URL to cover image | 99 |
    | bbeScore | Score in Best Books Ever list | 100 |
    | bbeVotes | Number of votes in Best Books Ever list | 100 |
    | price | Book's price (extracted from Iberlibro) | 73 |

  2. Z

    Crossref metadata of COCI bibliographic resources, as of November 2018 and...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Che, Chao (2020). Crossref metadata of COCI bibliographic resources, as of November 2018 and LCC categories of the ISBN entities in the dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3241744
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Yan, Erjia
    Che, Chao
    Zhu, Yongjun
    Peroni, Silvio
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The all.zip CSV file (zipped) contains citation counts obtained from the November 2018 dump of COCI (https://doi.org/10.6084/m9.figshare.6741422.v3) and some metadata (title, DOI, number of authors, ISBN, ISBN of the container, type of the bibliographic resource) of the related citing and cited entities obtained by using the Crossref dump downloaded in October 2018 – which is the same dump used to create the COCI data.

    In addition, it contains all the Library of Congress Classification (LCC) categories associated with each ISBN in the previous dataset (file isbn_cat_lcc.csv), according to the data retrieved using the services at http://classify.oclc.org/classify2/api_docs/index.html. Two ancillary mapping files have been also added: one (ddc_to_lcc_mapping.csv) for converting a Dewey Decimal Classification (DDC) categories into LCC categories, in the case the service mentioned above returned only DDC categories for some ISBN; the other (lcc_to_wos_mapping.csv) to map each LCC category into the related Web of Science research area.

  3. d

    Open e-commerce 1.0: Five years of crowdsourced U.S. Amazon purchase...

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Dec 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alex Berke; Dan Calacci; Robert Mahari; Takahiro Yabe; Kent Larson; Sandy Pentland (2023). Open e-commerce 1.0: Five years of crowdsourced U.S. Amazon purchase histories with user demographics [Dataset]. http://doi.org/10.7910/DVN/YGLYDY
    Explore at:
    Dataset updated
    Dec 16, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Alex Berke; Dan Calacci; Robert Mahari; Takahiro Yabe; Kent Larson; Sandy Pentland
    Description

    This dataset contains longitudinal purchases data from 5027 Amazon.com users in the US, spanning 2018 through 2022: amazon-purchases.csv It also includes demographic data and other consumer level variables for each user with data in the dataset. These consumer level variables were collected through an online survey and are included in survey.csv fields.csv describes the columns in the survey.csv file, where fields/survey columns correspond to survey questions. The dataset also contains the survey instrument used to collect the data. More details about the survey questions and possible responses, and the format in which they were presented can be found by viewing the survey instrument. A 'Survey ResponseID' column is present in both the amazon-purchases.csv and survey.csv files. It links a user's survey responses to their Amazon.com purchases. The 'Survey ResponseID' was randomly generated at the time of data collection. amazon-purchases.csv Each row in this file corresponds to an Amazon order. Each such row has the following columns: Survey ResponseID Order date Shipping address state Purchase price per unit Quantity ASIN/ISBN (Product Code) Title Category The data were exported by the Amazon users from Amazon.com and shared by users with their informed consent. PII and other information not listed above were stripped from the data. This processing occurred on users' machines before sharing with researchers.

  4. Activity Recognition

    • kaggle.com
    Updated Apr 15, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alex Kudin (2019). Activity Recognition [Dataset]. https://www.kaggle.com/datasets/avk256/activity-recognition/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 15, 2019
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Alex Kudin
    Description

    Source:

    Uncalibrated Accelerometer Data are collected from 15 participantes performing 7 activities. The dataset provides challenges for identification and authentication of people using motion patterns.

    Data Set Information:

    --- The dataset collects data from a wearable accelerometer mounted on the chest --- Sampling frequency of the accelerometer: 52 Hz --- Accelerometer Data are Uncalibrated --- Number of Participants: 15 --- Number of Activities: 7 --- Data Format: CSV

    Attribute Information:

    --- Data are separated by participant --- Each file contains the following information ---- sequential number, x acceleration, y acceleration, z acceleration, label --- Labels are codified by numbers --- 1: Working at Computer --- 2: Standing Up, Walking and Going updown stairs --- 3: Standing --- 4: Walking --- 5: Going UpDown Stairs --- 6: Walking and Talking with Someone --- 7: Talking while Standing

    Relevant Papers:

    --- Casale, P. Pujol, O. and Radeva, P. 'BeaStreamer-v0.1: a new platform for Multi-Sensors Data Acquisition in Wearable Computing Applications', CVCRD09, ISBN: 978-84-937261-1-9, 2009 available on [Web Link]

    --- Casale, P. Pujol, O. and Radeva, P. 'Human activity recognition from accelerometer data using a wearable device', IbPRIA'11, 289-296, Springer-Verlag, 2011 available on [Web Link]

    --- Casale, P. Pujol, O. and Radeva, P. 'Personalization and user verification in wearable systems using biometric walking patterns' Personal and Ubiquitous Computing, 16(5), 563-580, 2012 available on [Web Link]

    Citation Request:

    Casale, P. Pujol, O. and Radeva, P. 'Personalization and user verification in wearable systems using biometric walking patterns' Personal and Ubiquitous Computing, 16(5), 563-580, 2012

  5. BlogCatalog dataset

    • figshare.com
    zip
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nitin Agarwal; Xufei Wang (2023). BlogCatalog dataset [Dataset]. http://doi.org/10.6084/m9.figshare.11923611.v3
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Nitin Agarwal; Xufei Wang
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Abstract: BlogCatalog is the social blog directory which manages the bloggers and their blogs.Number of Nodes:10,312Number of Edges:333,983Missing Values?noSource:Nitin Agarwal+, Xufei Wang*, Huan Liu*+ Department of Information Science, University of Arkansas at Little Rock. E-mail:nxagarwal@ualr.edu* School of Computing, Informatics and Decision Systems Engineering, Arizona State University. E-mail: huan.liu@asu.edu, xufei.wang@asu.eduData Set Information:2 files are included:1. nodes.csv-- it's the file of all the users. This file works as a dictionary of all the users in this data set. It's useful for fast reference. It contains all the node ids used in the dataset.2. edges.csv-- this is the friendship network among the bloggers. The blogger's friends are represented using edges. Here is an example.1,2This means blogger with id "1" is friend with blogger id "2".Attribute Information:This is the data set crawled on July, 2009 from BlogCatalog ( http://www.blogcatalog.com ). BlogCatalog is a social blog directory website. This contains the friendship network crawled. For easier understanding, all the contents are organized in CSV file format.-. Basic statisticsNumber of bloggers : 88,784Number of friendship pairs: 4,186,390Relevant Papers:Nitin Agarwal and Huan Liu. ”Modeling and Data Mining in Blogosphere”, Synthesis Lectures on Data Mining and Knowledge Discovery #1, Morgan & Claypool Publishers, Robert Grossman (Editor), August 2009. ISBN: 9781598299083 (paperback) ISBN: 9781598299090 (ebook) Nitin Agarwal, Magdiel Galan, Huan Liu, and Shankar Subramanya. WisColl: Collective Wisdom based Blog Clustering. Journal of Information Science, 180(1): 39-61, January, 2010. Nitin Agarwal, Huan Liu, Sudheendra Murthy, Arunabha Sen, and Xufei Wang. A Social Identity Approach to Identify Familiar Strangers in a Social Network. In Proceedings of the Third International AAAI Conference on Weblogs and Social Media (ICWSM09), pp. 2 - 9, May 17-20, 2009. San Jose, California. Nitin Agarwal, Huan Liu, Sudheendra Murthy, Arunabha Sen, and Xufei Wang. "A Social Identity Approach to Identify Familiar Strangers in a Social Network", 3rd International AAAI Conference on Weblogs and Social Media (ICWSM09), pp. 2 - 9, May 17-20, 2009. San Jose, California.

  6. SciGRID_gas LKD_Raw

    • zenodo.org
    bin, csv, pdf, zip
    Updated Jul 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jan Diettrich; Adam Pluta; Adam Pluta; Wided Medjroubi; Wided Medjroubi; Jan Diettrich (2024). SciGRID_gas LKD_Raw [Dataset]. http://doi.org/10.5281/zenodo.3985271
    Explore at:
    csv, bin, pdf, zipAvailable download formats
    Dataset updated
    Jul 19, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Jan Diettrich; Adam Pluta; Adam Pluta; Wided Medjroubi; Wided Medjroubi; Jan Diettrich
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The LKD_raw dataset is an outcome of the SciGRID_gas project

    The data set contains geographical and meta information on the European gas transport network. The data originats from gas data of LKD_EU (ISBN: 978-3-86780-554-4) project. The original data repository can be found under 10.5281/zenodo.1044462.

    The original data has been partially cleaned up and converted to fit to the SciGRID_gas project data structure.

    The data is being stored in both CSV and GeoJSON files.

  7. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Lorena Casanova Lozano; Sergio Costa Planells; Lorena Casanova Lozano; Sergio Costa Planells (2020). Best Books Ever Dataset [Dataset]. http://doi.org/10.5281/zenodo.4265096
Organization logo

Best Books Ever Dataset

Explore at:
3 scholarly articles cite this dataset (View in Google Scholar)
csvAvailable download formats
Dataset updated
Nov 10, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Lorena Casanova Lozano; Sergio Costa Planells; Lorena Casanova Lozano; Sergio Costa Planells
License

Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically

Description

The dataset has been collected in the frame of the Prac1 of the subject Tipology and Data Life Cycle of the Master's Degree in Data Science of the Universitat Oberta de Catalunya (UOC).

The dataset contains 25 variables and 52478 records corresponding to books on the GoodReads Best Books Ever list (the larges list on the site).

Original code used to retrieve the dataset can be found on github repository: github.com/scostap/goodreads_bbe_dataset

The data was retrieved in two sets, the first 30000 books and then the remainig 22478. Dates were not parsed and reformated on the second chunk so publishDate and firstPublishDate are representet in a mm/dd/yyyy format for the first 30000 records and Month Day Year for the rest.

Book cover images can be optionally downloaded from the url in the 'coverImg' field. Python code for doing so and an example can be found on the github repo.

The 25 fields of the dataset are:

| Attributes | Definition | Completeness |
| ------------- | ------------- | ------------- | 
| bookId | Book Identifier as in goodreads.com | 100 |
| title | Book title | 100 |
| series | Series Name | 45 |
| author | Book's Author | 100 |
| rating | Global goodreads rating | 100 |
| description | Book's description | 97 |
| language | Book's language | 93 |
| isbn | Book's ISBN | 92 |
| genres | Book's genres | 91 |
| characters | Main characters | 26 |
| bookFormat | Type of binding | 97 |
| edition | Type of edition (ex. Anniversary Edition) | 9 |
| pages | Number of pages | 96 |
| publisher | Editorial | 93 |
| publishDate | publication date | 98 |
| firstPublishDate | Publication date of first edition | 59 |
| awards | List of awards | 20 |
| numRatings | Number of total ratings | 100 |
| ratingsByStars | Number of ratings by stars | 97 |
| likedPercent | Derived field, percent of ratings over 2 starts (as in GoodReads) | 99 |
| setting | Story setting | 22 |
| coverImg | URL to cover image | 99 |
| bbeScore | Score in Best Books Ever list | 100 |
| bbeVotes | Number of votes in Best Books Ever list | 100 |
| price | Book's price (extracted from Iberlibro) | 73 |

Search
Clear search
Close search
Google apps
Main menu