21 datasets found
  1. Best Books Ever Dataset

    • zenodo.org
    csv
    Updated Nov 10, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lorena Casanova Lozano; Sergio Costa Planells; Lorena Casanova Lozano; Sergio Costa Planells (2020). Best Books Ever Dataset [Dataset]. http://doi.org/10.5281/zenodo.4265096
    Explore at:
    csvAvailable download formats
    Dataset updated
    Nov 10, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Lorena Casanova Lozano; Sergio Costa Planells; Lorena Casanova Lozano; Sergio Costa Planells
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    The dataset has been collected in the frame of the Prac1 of the subject Tipology and Data Life Cycle of the Master's Degree in Data Science of the Universitat Oberta de Catalunya (UOC).

    The dataset contains 25 variables and 52478 records corresponding to books on the GoodReads Best Books Ever list (the larges list on the site).

    Original code used to retrieve the dataset can be found on github repository: github.com/scostap/goodreads_bbe_dataset

    The data was retrieved in two sets, the first 30000 books and then the remainig 22478. Dates were not parsed and reformated on the second chunk so publishDate and firstPublishDate are representet in a mm/dd/yyyy format for the first 30000 records and Month Day Year for the rest.

    Book cover images can be optionally downloaded from the url in the 'coverImg' field. Python code for doing so and an example can be found on the github repo.

    The 25 fields of the dataset are:

    | Attributes | Definition | Completeness |
    | ------------- | ------------- | ------------- | 
    | bookId | Book Identifier as in goodreads.com | 100 |
    | title | Book title | 100 |
    | series | Series Name | 45 |
    | author | Book's Author | 100 |
    | rating | Global goodreads rating | 100 |
    | description | Book's description | 97 |
    | language | Book's language | 93 |
    | isbn | Book's ISBN | 92 |
    | genres | Book's genres | 91 |
    | characters | Main characters | 26 |
    | bookFormat | Type of binding | 97 |
    | edition | Type of edition (ex. Anniversary Edition) | 9 |
    | pages | Number of pages | 96 |
    | publisher | Editorial | 93 |
    | publishDate | publication date | 98 |
    | firstPublishDate | Publication date of first edition | 59 |
    | awards | List of awards | 20 |
    | numRatings | Number of total ratings | 100 |
    | ratingsByStars | Number of ratings by stars | 97 |
    | likedPercent | Derived field, percent of ratings over 2 starts (as in GoodReads) | 99 |
    | setting | Story setting | 22 |
    | coverImg | URL to cover image | 99 |
    | bbeScore | Score in Best Books Ever list | 100 |
    | bbeVotes | Number of votes in Best Books Ever list | 100 |
    | price | Book's price (extracted from Iberlibro) | 73 |

  2. p

    Global Data Lab Area Database (PRIO mirror)

    • prio.org
    Updated Jan 28, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peace Research Institute Oslo (PRIO) (2025). Global Data Lab Area Database (PRIO mirror) [Dataset]. https://www.prio.org/data/35
    Explore at:
    Dataset updated
    Jan 28, 2025
    Dataset provided by
    Peace Research Institute Oslo (PRIO)
    Time period covered
    1992 - 2023
    Area covered
    Global
    Description

    PRIO is hosting a copy of this dataset with permission from Global Data Lab. Please see their webpage for more information about this data.

  3. NIST Statistical Reference Datasets - SRD 140

    • catalog.data.gov
    • datasets.ai
    • +2more
    Updated Jul 29, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institute of Standards and Technology (2022). NIST Statistical Reference Datasets - SRD 140 [Dataset]. https://catalog.data.gov/dataset/nist-statistical-reference-datasets-srd-140-df30c
    Explore at:
    Dataset updated
    Jul 29, 2022
    Dataset provided by
    National Institute of Standards and Technologyhttp://www.nist.gov/
    Description

    The purpose of this project is to improve the accuracy of statistical software by providing reference datasets with certified computational results that enable the objective evaluation of statistical software. Currently datasets and certified values are provided for assessing the accuracy of software for univariate statistics, linear regression, nonlinear regression, and analysis of variance. The collection includes both generated and 'real-world' data of varying levels of difficulty. Generated datasets are designed to challenge specific computations. These include the classic Wampler datasets for testing linear regression algorithms and the Simon & Lesage datasets for testing analysis of variance algorithms. Real-world data include challenging datasets such as the Longley data for linear regression, and more benign datasets such as the Daniel & Wood data for nonlinear regression. Certified values are 'best-available' solutions. The certification procedure is described in the web pages for each statistical method. Datasets are ordered by level of difficulty (lower, average, and higher). Strictly speaking the level of difficulty of a dataset depends on the algorithm. These levels are merely provided as rough guidance for the user. Producing correct results on all datasets of higher difficulty does not imply that your software will pass all datasets of average or even lower difficulty. Similarly, producing correct results for all datasets in this collection does not imply that your software will do the same for your particular dataset. It will, however, provide some degree of assurance, in the sense that your package provides correct results for datasets known to yield incorrect results for some software. The Statistical Reference Datasets is also supported by the Standard Reference Data Program.

  4. Ten Thousand German News Articles Dataset

    • kaggle.com
    • tblock.github.io
    zip
    Updated Jan 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Timo Block (2022). Ten Thousand German News Articles Dataset [Dataset]. https://www.kaggle.com/tblock/10kgnad
    Explore at:
    zip(21144764 bytes)Available download formats
    Dataset updated
    Jan 20, 2022
    Authors
    Timo Block
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    (see https://tblock.github.io/10kGNAD/ for the original dataset page)

    This page introduces the 10k German News Articles Dataset (10kGNAD) german topic classification dataset. The 10kGNAD is based on the One Million Posts Corpus and avalaible under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. You can download the dataset here.

    Why a German dataset?

    English text classification datasets are common. Examples are the big AG News, the class-rich 20 Newsgroups and the large-scale DBpedia ontology datasets for topic classification and for example the commonly used IMDb and Yelp datasets for sentiment analysis. Non-english datasets, especially German datasets, are less common. There is a collection of sentiment analysis datasets assembled by the Interest Group on German Sentiment Analysis. However, to my knowlege, no german topic classification dataset is avaliable to the public.

    Due to grammatical differences between the English and the German language, a classifyer might be effective on a English dataset, but not as effectiv on a German dataset. The German language has a higher inflection and long compound words are quite common compared to the English language. One would need to evaluate a classifyer on multiple German datasets to get a sense of it's effectivness.

    The dataset

    The 10kGNAD dataset is intended to solve part of this problem as the first german topic classification dataset. It consists of 10273 german language news articles from an austrian online newspaper categorized into nine topics. These articles are a till now unused part of the One Million Posts Corpus.

    In the One Million Posts Corpus each article has a topic path. For example Newsroom/Wirtschaft/Wirtschaftpolitik/Finanzmaerkte/Griechenlandkrise. The 10kGNAD uses the second part of the topic path, here Wirtschaft, as class label. In result the dataset can be used for multi-class classification.

    I created and used this dataset in my thesis to train and evaluate four text classifyers on the German language. By publishing the dataset I hope to support the advancement of tools and models for the German language. Additionally this dataset can be used as a benchmark dataset for german topic classification.

    Numbers and statistics

    As in most real-world datasets the class distribution of the 10kGNAD is not balanced. The biggest class Web consists of 1678, while the smalles class Kultur contains only 539 articles. However articles from the Web class have on average the fewest words, while artilces from the culture class have the second most words.

    Splitting into train and test

    I propose a stratifyed split of 10% for testing and the remaining articles for training. To use the dataset as a benchmark dataset, please used the train.csv and test.csv files located in the project root.

    Code

    Python scripts to extract the articles and split them into a train- and a testset avaliable in the code directory of this project. Make sure to install the requirements. The original corpus.sqlite3 is required to extract the articles (download here (compressed) or here (uncompressed)).

    License

    Creative Commons License

    This dataset is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Please consider citing the authors of the One Million Post Corpus if you use the dataset.

  5. LAS&T: Large Shape And Texture Dataset

    • zenodo.org
    jpeg, zip
    Updated May 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sagi Eppel; Sagi Eppel (2025). LAS&T: Large Shape And Texture Dataset [Dataset]. http://doi.org/10.5281/zenodo.15453634
    Explore at:
    jpeg, zipAvailable download formats
    Dataset updated
    May 26, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Sagi Eppel; Sagi Eppel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Large Shape And Texture Dataset (LAS&T)

    LAS&T is the largest and most diverse dataset for shape, texture and material recognition and retrieval in 2D and 3D with 650,000 images, based on real world shapes and textures.

    Overview

    The LAS&T Dataset aims to test the most basic aspect of vision in the most general way. Mainly the ability to identify any shape, texture, and material in any setting and environment, without being limited to specific types or classes of objects, materials, and environments. For shapes, this means identifying and retrieving any shape in 2D or 3D with every element of the shape changed between images, including the shape material and texture, orientation, size, and environment. For textures and materials, the goal is to recognize the same texture or material when appearing on different objects, environments, and light conditions. The dataset relies on shapes, textures, and materials extracted from real-world images, leading to an almost unlimited quantity and diversity of real-world natural patterns. Each section of the dataset (shapes, and textures), contains 3D parts that rely on physics-based scenes with realistic light materials and object simulation and abstract 2D parts. In addition, the real-world benchmark for 3D shapes.

    Main Dataset webpage

    The dataset contain four parts parts:

    3D shape recognition and retrieval.

    2D shape recognition and retrieval.

    3D Materials recognition and retrieval.

    2D Texture recognition and retrieval.

    Each can be used independently for training and testing.

    Additional assets are a set of 350,000 natural 2D shapes extracted from real-world images (SHAPES_COLLECTION_350k.zip)

    3D shape recognition real-world images benchmark

    The scripts used to generate and test the dataset are supplied as in SCRIPT** files.

    Shapes Recognition and Retrieval:

    For shape recognition the goal is to identify the same shape in different images, where the material/texture/color of the shape is changed, the shape is rotated, and the background is replaced. Hence, only the shape remains the same in both images. All files with 3D shapes contain samples of the 3D shape dataset. This is tested for 3D shapes/objects with realistic light simulation. All files with 2D shapes contain samples of the 2D shape dataset. Examples files contain images with examples for each set.

    Main files:

    Real_Images_3D_shape_matching_Benchmarks.zip contains real-world image benchmarks for 3D shapes.

    3D_Shape_Recognition_Synthethic_GENERAL_LARGE_SET_76k.zip A Large number of synthetic examples 3D shapes with max variability can be used for training/testing 3D shape/objects recognition/retrieval.

    2D_Shapes_Recognition_Textured_Synthetic_Resize2_GENERAL_LARGE_SET_61k.zip A Large number of synthetic examples for 2D shapes with max variability can be used for training/testing 2D shape recognition/retrieval.

    SHAPES_2D_365k.zip 365,000 2D shapes extracted from real-world images saved as black and white .png image files.

    File structure:

    All jpg images that are in the exact same subfolder contain the exact same shape (but with different texture/color/background/orientation).

    Textures and Materials Recognition and Retrieval

    For texture and materials, the goal is to identify and match images containing the same material or textures, however the shape/object on which the material texture is applied is different, and so is the background and light.

    This is done for physics-based material in 3D and abstract 2D textures.

    3D_Materials_PBR_Synthetic_GENERAL_LARGE_SET_80K.zip A Large number of examples of 3D materials in physics grounded can be used for training or testing of material recognition/retrieval.

    2D_Textures_Recogition_GENERAL_LARGE_SET_Synthetic_53K.zip

    Large number of images of 2D texture in maximum variability of setting can be used for training/testing 2D textured recognition/retrieval.

    File structure:

    All jpg images that are in the exact same subfolder contain the exact same texture/material (but overlay on different objects with different background/and illumination/orientation).

    Data Generation:

    The images in the synthetic part of the dataset were created by automatically extracting shapes and textures from natural images and combining them in synthetic images. This created synthetic images that completely rely on real-world patterns, making extremely diverse and complex shapes and textures. As far as we know this is the largest and most diverse shape and texture recognition/retrieval dataset. 3D data was generated using physics-based material and rendering (blender) making the images physically grounded and enabling using the data to train for real-world examples. The scripts for generating the data are supplied in files with the world SCRIPTS* in them.

    Real-world image data:

    For 3D shape recognition and retrieval, we also supply a real-world natural image benchmark. With a variety of natural images containing the exact same 3D shape but made/coated with different materials and in different environments and orientations. The goal is again to identify the same shape in different images. The benchmark is available at: Real_Images_3D_shape_matching_Benchmarks.zip

    File structure:

    Files containing the word 'GENERAL_LARGE_SET' contains synthetic images that can be used for training or testing, the type of data (2D shapes, 3D shapes, 2D textures, 3D materials) that appears in the file name, as well as the number of images. Files containing MultiTests contain a number of different tests in which only a single aspect of the aspect of the instance is changed (for example only the background.) File containing "SCRIPTS" contain data generation testing scripts. Images containing "examples" are example of each test.

    Shapes Collections

    The file SHAPES_COLLECTION_350k.zip contains 350,000 2D shapes extracted from natural images and used for the dataset generation.

    Evaluating and Testing

    For evaluating and testing see: SCRIPTS_Testing_LVLM_ON_LAST_VQA.zip
    This can be use to test leading LVLMs using api, create human tests, and in general turn the dataset into multichoice question images similar to the one in the paper.

  6. Copy (3) Southwest Watershed Research Center Online Data Access

    • catalog.data.gov
    • agdatacommons.nal.usda.gov
    • +1more
    Updated Apr 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). Copy (3) Southwest Watershed Research Center Online Data Access [Dataset]. https://catalog.data.gov/dataset/copy-3-southwest-watershed-research-center-online-data-access-18dca
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Agricultural Research Servicehttps://www.ars.usda.gov/
    Description

    Hydrologic data, primarily precipitation and runoff, have been collected on experimental watersheds operated by the U.S. Department of Agriculture Agricultural Research Service (USDA-ARS) and on other lands in southeastern Arizona since the 1950s. These data are of national and international importance and make up one of the most comprehensive semiarid watershed data sets in the world. The USDA-ARS Southwest Watershed Research Center has recently developed an electronic data processing system that includes an online interface (https://tucson.ars.ag.gov/dap) to provide public access to the data. The goal of the system is to promote analyses and interpretations of historic and current data by improving data access. The publicly accessible part of the system consists of an interactive Web site, which provides an interface to the data, and a relational database, which is used to process, store, and manage data. In addition, DAP was expanded to put sediment, meteorological, soil moisture and temperature, vegetation, CO2 and water flux, geographic information system (GIS) and aircraft and satellite spectral imagery data on line and to publish metadata for all WGEW long-term measurements. Resources in this dataset:Resource Title: Web Page. File Name: WGEWsoils.xls, url: https://www.tucson.ars.ag.gov/dap/Files/WGEWsoils.xls

  7. Data from: Cacao Genome Database

    • catalog.data.gov
    • datasets.ai
    • +2more
    Updated Apr 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). Cacao Genome Database [Dataset]. https://catalog.data.gov/dataset/cacao-genome-database-0d068
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Agricultural Research Servicehttps://www.ars.usda.gov/
    Description

    Not only is cacao the basic ingredient in the world’s favorite confection, chocolate, but it provides a livelihood for over 6.5 million farmers in Africa, South America and Asia and ranks as one of the top ten agriculture commodities in the world. Historically, cocoa production has been plagued by serious losses due to pests and diseases. The release of the cacao genome sequence will provide researchers with access to the latest genomic tools, enabling more efficient research and accelerating the breeding process, thereby expediting the release of superior cacao cultivars. The sequenced genotype, Matina 1-6, is representative of the genetic background most commonly found in the cacao producing countries, enabling results to be applied immediately and broadly to current commercial cultivars. Matina 1-6 is highly homozygous which greatly reduces the complexity of the sequence assembly process. While the sequence provided is a preliminary release, it already covers 92% of the genome, with approximately 35,000 genes. We will continue to refine the assembly and annotation, working toward a complete finished sequence. Updates will be made available via the main project website. Resources in this dataset:Resource Title: Cacao Genome Database. File Name: Web Page, url: http://www.cacaogenomedb.org/

  8. SAPFLUXNET: A global database of sap flow measurements

    • zenodo.org
    • explore.openaire.eu
    • +1more
    zip
    Updated Sep 26, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rafael Poyatos; Rafael Poyatos; Víctor Granda; Víctor Granda; Víctor Flo; Víctor Flo; Roberto Molowny-Horas; Roberto Molowny-Horas; Kathy Steppe; Kathy Steppe; Maurizio Mencuccini; Maurizio Mencuccini; Jordi Martínez-Vilalta; Jordi Martínez-Vilalta (2020). SAPFLUXNET: A global database of sap flow measurements [Dataset]. http://doi.org/10.5281/zenodo.3697807
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 26, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Rafael Poyatos; Rafael Poyatos; Víctor Granda; Víctor Granda; Víctor Flo; Víctor Flo; Roberto Molowny-Horas; Roberto Molowny-Horas; Kathy Steppe; Kathy Steppe; Maurizio Mencuccini; Maurizio Mencuccini; Jordi Martínez-Vilalta; Jordi Martínez-Vilalta
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    General description

    SAPFLUXNET contains a global database of sap flow and environmental data, together with metadata at different levels.
    SAPFLUXNET is a harmonised database, compiled from contributions from researchers worldwide. This version (0.1.4) contains more than 200 datasets, from all over the World, covering a broad range of bioclimatic conditions.
    More information on the coverage can be found here: http://sapfluxnet.creaf.cat/shiny/sfn_progress_dashboard/.


    The SAPFLUXNET project has been developed by researchers at CREAF and other institutions (http://sapfluxnet.creaf.cat/#team), coordinated by Rafael Poyatos (CREAF, http://www.creaf.cat/staff/rafael-poyatos-lopez), and funded by two Spanish Young Researcher's Grants (SAPFLUXNET, CGL2014-55883-JIN; DATAFORUSE, RTI2018-095297-J-I00 ) and an Alexander von Humboldt Research Fellowship for Experienced Researchers).

    Variables and units

    SAPFLUXNET contains whole-plant sap flow and environmental variables at sub-daily temporal resolution. Both sap flow and environmental time series have accompanying flags in a data frame, one for sap flow and another for environmental
    variables. These flags store quality issues detected during the quality control process and can be used to add further quality flags.

    Metadata contain relevant variables informing about site conditions, stand characteristics, tree and species attributes, sap flow methodology and details on environmental measurements. To learn more about variables, units and data flags please use the functionalities implemented in the sapfluxnetr package (https://github.com/sapfluxnet/sapfluxnetr). In particular, have a look at the package vignettes using R:

    # remotes::install_github(
    #  'sapfluxnet/sapfluxnetr',
    #  build_opts = c("--no-resave-data", "--no-manual", "--build-vignettes")
    # )
    library(sapfluxnetr)
    # to list all vignettes
    vignette(package='sapfluxnetr')
    # variables and units
    vignette('metadata-and-data-units', package='sapfluxnetr')
    # data flags
    vignette('data-flags', package='sapfluxnetr')

    Data formats

    SAPFLUXNET data can be found in two formats: 1) RData files belonging to the custom-built 'sfn_data' class and 2) Text files in .csv format. We recommend using the sfn_data objects together with the sapfluxnetr package, although we also provide the text files for convenience. For each dataset, text files are structured in the same way as the slots of sfn_data objects; if working with text files, we recommend that you check the data structure of 'sfn_data' objects in the corresponding vignette.

    Working with sfn_data files

    To work with SAPFLUXNET data, first they have to be downloaded from Zenodo, maintaining the folder structure. A first level in the folder hierarchy corresponds to file format, either RData files or csv's. A second level corresponds to how sap flow is expressed: per plant, per sapwood area or per leaf area. Please note that interconversions among the magnitudes have been performed whenever possible. Below this level, data have been organised per dataset. In the case of RData files, each dataset is contained in a sfn_data object, which stores all data and metadata in different slots (see the vignette 'sfn-data-classes'). In the case of csv files, each dataset has 9 individual files, corresponding to metadata (5), sap flow and environmental data (2) and their corresponding data flags (2).

    After downloading the entire database, the sapfluxnetr package can be used to:
    - Work with data from a single site: data access, plotting and time aggregation.
    - Select the subset datasets to work with.
    - Work with data from multiple sites: data access, plotting and time aggregation.

    Please check the following package vignettes to learn more about how to work with sfn_data files:

    Quick guide

    Metadata and data units

    sfn_data classes

    Custom aggregation

    Memory and parallelization

    Working with text files

    We recommend to work with sfn_data objects using R and the sapfluxnetr package and we do not currently provide code to work with text files.

    Data issues and reporting

    Please report any issue you may find in the database by sending us an email: sapfluxnet@creaf.uab.cat.

    Temporary data fixes, detected but not yet included in released versions will be published in SAPFLUXNET main web page ('Known data errors').

    Data access, use and citation

    This version of the SAPFLUXNET database is open access. We are working on a data paper describing the database, but, before its publication, please cite this Zenodo entry if SAPFLUXNET is used in any publication.

  9. E-commerce - Users of a French C2C fashion store

    • kaggle.com
    Updated Feb 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jeffrey Mvutu Mabilama (2024). E-commerce - Users of a French C2C fashion store [Dataset]. https://www.kaggle.com/jmmvutu/ecommerce-users-of-a-french-c2c-fashion-store/notebooks
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 24, 2024
    Dataset provided by
    Kaggle
    Authors
    Jeffrey Mvutu Mabilama
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    French
    Description

    Foreword

    This users dataset is a preview of a much bigger dataset, with lots of related data (product listings of sellers, comments on listed products, etc...).

    My Telegram bot will answer your queries and allow you to contact me.

    Context

    There are a lot of unknowns when running an E-commerce store, even when you have analytics to guide your decisions.

    Users are an important factor in an e-commerce business. This is especially true in a C2C-oriented store, since they are both the suppliers (by uploading their products) AND the customers (by purchasing other user's articles).

    This dataset aims to serve as a benchmark for an e-commerce fashion store. Using this dataset, you may want to try and understand what you can expect of your users and determine in advance how your grows may be.

    • For instance, if you see that most of your users are not very active, you may look into this dataset to compare your store's performance.

    If you think this kind of dataset may be useful or if you liked it, don't forget to show your support or appreciation with an upvote/comment. You may even include how you think this dataset might be of use to you. This way, I will be more aware of specific needs and be able to adapt my datasets to suits more your needs.

    This dataset is part of a preview of a much larger dataset. Please contact me for more.

    Content

    The data was scraped from a successful online C2C fashion store with over 10M registered users. The store was first launched in Europe around 2009 then expanded worldwide.

    Visitors vs Users: Visitors do not appear in this dataset. Only registered users are included. "Visitors" cannot purchase an article but can view the catalog.

    Acknowledgements

    We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.

    Inspiration

    Questions you might want to answer using this dataset:

    • Are e-commerce users interested in social network feature ?
    • Are my users active enough (compared to those of this dataset) ?
    • How likely are people from other countries to sign up in a C2C website ?
    • How many users are likely to drop off after years of using my service ?

    Example works:

    • Report(s) made using SQL queries can be found on the data.world page of the dataset.
    • Notebooks may be found on the Kaggle page of the dataset.

    License

    CC-BY-NC-SA 4.0

    For other licensing options, contact me.

  10. IMDb Top Rated English Movies

    • kaggle.com
    Updated Nov 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alex Quilis (2023). IMDb Top Rated English Movies [Dataset]. https://www.kaggle.com/datasets/alexq1111/imdb-top-rated-english-movies/versions/1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 10, 2023
    Dataset provided by
    Kaggle
    Authors
    Alex Quilis
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    I scraped data from IMDb to create a dataset of top-rated English movies. It includes movie names, release years, ratings, and user votes. The goal is to provide a valuable resource for movie enthusiasts and data analysts.

    Sources: The data comes directly from IMDb, a popular movie information platform. I used web scraping to extract details from IMDb pages, ensuring the dataset is accurate and comprehensive.

    Educational Intent: The entire data collection effort was driven by educational purposes, aiming to provide a curated dataset for analysis and exploration. Users are encouraged to leverage the dataset for educational and non-commercial purposes while being mindful of IMDb's terms of service.

    Inspiration for Skill Improvement: This project helped me improve my web scraping skills, especially in navigating HTML structures and handling data extraction. I also honed my data cleaning and preprocessing abilities to ensure the dataset's quality. Analyzing and visualizing the data further improved my data analysis skills. Overall, this practical project enhanced my proficiency in handling real-world datasets.

  11. Data from: Datasets for transcriptomic analyses of maize leaves in response...

    • catalog.data.gov
    • agdatacommons.nal.usda.gov
    Updated Apr 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). Data from: Datasets for transcriptomic analyses of maize leaves in response to Asian corn borer feeding and/or jasmonic acid [Dataset]. https://catalog.data.gov/dataset/data-from-datasets-for-transcriptomic-analyses-of-maize-leaves-in-response-to-asian-corn-b-d9ac5
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Agricultural Research Servicehttps://www.ars.usda.gov/
    Description

    Corn (Zea mays) is one of the most widely grown crops throughout the world. However, many corn fields develop pest problems such as corn borers every year that seriously affect its yield and quality. Corn's response to initial insect damage involves a variety of changes to the levels of defensive enzymes, toxins, and communicative volatiles. Such a dramatic change secondary metabolism necessitates the regulation of gene expression at the transcript level. This Data In Brief paper summarizes the datasets of the transcriptome of corn plants in response to corn stalk borers (Ostrinia furnacalis) and/or methyl jasmonate (MeJA). Altogether, 39, 636 genes were found to be differentially expressed. The sequencing data are available in the NCBI SRA database under accession number SRS965087. This dataset will provide more scientific and valuable information for future work such as the study of the functions of important genes or proteins and develop new insect-resistant maize varieties. Includes supplementary tables and data in fasta and GTF format. Resources in this dataset:Resource Title: Datasets for transcriptomic analyses of maize leaves in response to Asian corn borer feeding and/or jasmonic acid. File Name: Web Page, url: https://www.sciencedirect.com/science/article/pii/S2352340916301792 Data in Brief Article including supplemental data in fasta and GTF format.

  12. P

    Tracking the Trackers Dataset

    • paperswithcode.com
    • opendatalab.com
    Updated May 5, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2016). Tracking the Trackers Dataset [Dataset]. https://paperswithcode.com/dataset/tracking-the-trackers
    Explore at:
    Dataset updated
    May 5, 2021
    Description

    Tracking the Trackers is a large-scale analysis of third-party trackers on the World Wide Web. We extract third-party embeddings from more than 3.5 billion web pages of the CommonCrawl 2012 corpus, and aggregate those to a dataset containing more than 140 million third-party embeddings in over 41 million domains.

  13. A

    ‘QS World University Rankings 2017 - 2022’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘QS World University Rankings 2017 - 2022’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-qs-world-university-rankings-2017-2022-7fc4/d793e726/?iid=007-103&v=presentation
    Explore at:
    Dataset updated
    Aug 1, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘QS World University Rankings 2017 - 2022’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/padhmam/qs-world-university-rankings-2017-2022 on 13 February 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    QS World University Rankings is an annual publication of global university rankings by Quacquarelli Symonds. The QS ranking receives approval from the International Ranking Expert Group (IREG), and is viewed as one of the three most-widely read university rankings in the world. QS publishes its university rankings in partnership with Elsevier.

    Content

    This dataset contains university data from the year 2017 to 2022. It has a total of 15 features. - university - name of the university - year - year of ranking - rank_display - rank given to the university - score - score of the university based on the six key metrics mentioned above - link - link to the university profile page on QS website - country - country in which the university is located - city - city in which the university is located - region - continent in which the university is located - logo - link to the logo of the university - type - type of university (public or private) - research_output - quality of research at the university - student_faculty_ratio - number of students assigned to per faculty - international_students - number of international students enrolled at the university - size - size of the university in terms of area - faculty_count - number of faculty or academic staff at the university

    Acknowledgements

    This dataset was acquired by scraping the QS World University Rankings website with Python and Selenium. Cover Image: Source

    Inspiration

    Some of the questions that can be answered with this dataset, 1. What makes a best ranked university? 2. Does the location of a university play a role in its ranking? 3. What do the best universities have in common? 4. How important is academic research for a university? 5. Which country is preferred by international students?

    --- Original source retains full ownership of the source dataset ---

  14. Wikipedia Article Titles

    • kaggle.com
    Updated Sep 22, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aleksey Bilogur (2017). Wikipedia Article Titles [Dataset]. https://www.kaggle.com/residentmario/wikipedia-article-titles/kernels
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 22, 2017
    Dataset provided by
    Kaggle
    Authors
    Aleksey Bilogur
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Context

    Wikipedia, the world's largest encyclopedia, is a crowdsourced open knowledge project and website with millions of individual web pages. This dataset is a grab of the title of every article on Wikipedia as of September 20, 2017.

    Content

    This dataset is a simple newline () delimited list of article titles. No distinction is made between redirects (like Schwarzenegger) and actual article pages (like Arnold Schwarzenegger).

    Acknowledgements

    This dataset was created by scraping Special:AllPages on Wikipedia. It was originally shared here.

    Inspiration

    • What are common article title tokens? How do they compare against frequent words in the English language?
    • What is the longest article title? The shortest?
    • What countries are most popular within article titles?
  15. a

    Vatican Data, Year of Statistical Data

    • hub.arcgis.com
    • catholic-geo-hub-cgisc.hub.arcgis.com
    Updated Oct 22, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    burhansm2 (2019). Vatican Data, Year of Statistical Data [Dataset]. https://hub.arcgis.com/maps/36fcd8c2e2b04b48bcbc19602dcda867
    Explore at:
    Dataset updated
    Oct 22, 2019
    Dataset authored and provided by
    burhansm2
    License

    Attribution-NoDerivs 4.0 (CC BY-ND 4.0)https://creativecommons.org/licenses/by-nd/4.0/
    License information was derived automatically

    Area covered
    Description

    Vatican Data Series {title at top of page}Data Developers: Burhans, Molly A., Cheney, David M., Emege, Thomas, Gerlt, R.. . “Vatican Data Series {title at top of page}”. Scale not given. Version 1.0. MO and CT, USA: GoodLands Inc., Catholic Hierarchy, Environmental Systems Research Institute, Inc., 2019.Web map developer: Molly Burhans, October 2019Web app developer: Molly Burhans, October 2019GoodLands’ polygon data layers, version 2.0 for global ecclesiastical boundaries of the Roman Catholic Church:Although care has been taken to ensure the accuracy, completeness and reliability of the information provided, due to this being the first developed dataset of global ecclesiastical boundaries curated from many sources it may have a higher margin of error than established geopolitical administrative boundary maps. Boundaries need to be verified with appropriate Ecclesiastical Leadership. The current information is subject to change without notice. No parties involved with the creation of this data are liable for indirect, special or incidental damage resulting from, arising out of or in connection with the use of the information. We referenced 1960 sources to build our global datasets of ecclesiastical jurisdictions. Often, they were isolated images of dioceses, historical documents and information about parishes that were cross checked. These sources can be viewed here:https://docs.google.com/spreadsheets/d/11ANlH1S_aYJOyz4TtG0HHgz0OLxnOvXLHMt4FVOS85Q/edit#gid=0To learn more or contact us please visit: https://good-lands.org/The Catholic Leadership global maps information is derived from the Annuario Pontificio, which is curated and published by the Vatican Statistics Office annually, and digitized by David Cheney at Catholic-Hierarchy.org -- updated are supplemented with diocesan and news announcements. GoodLands maps this into global ecclesiastical boundaries. Admin 3 Ecclesiastical Territories:Burhans, Molly A., Cheney, David M., Gerlt, R.. . “Admin 3 Ecclesiastical Territories For Web”. Scale not given. Version 1.2. MO and CT, USA: GoodLands Inc., Environmental Systems Research Institute, Inc., 2019.Derived from:Global Diocesan Boundaries:Burhans, M., Bell, J., Burhans, D., Carmichael, R., Cheney, D., Deaton, M., Emge, T. Gerlt, B., Grayson, J., Herries, J., Keegan, H., Skinner, A., Smith, M., Sousa, C., Trubetskoy, S. “Diocesean Boundaries of the Catholic Church” [Feature Layer]. Scale not given. Version 1.2. Redlands, CA, USA: GoodLands Inc., Environmental Systems Research Institute, Inc., 2016.Using: ArcGIS. 10.4. Version 10.0. Redlands, CA: Environmental Systems Research Institute, Inc., 2016.Boundary ProvenanceStatistics and Leadership DataCheney, D.M. “Catholic Hierarchy of the World” [Database]. Date Updated: August 2019. Catholic Hierarchy. Using: Paradox. Retrieved from Original Source.Catholic HierarchyAnnuario Pontificio per l’Anno .. Città del Vaticano :Tipografia Poliglotta Vaticana, Multiple Years.The data for these maps was extracted from the gold standard of Church data, the Annuario Pontificio, published yearly by the Vatican. The collection and data development of the Vatican Statistics Office are unknown. GoodLands is not responsible for errors within this data. We encourage people to document and report errant information to us at data@good-lands.org or directly to the Vatican.Additional information about regular changes in bishops and sees comes from a variety of public diocesan and news announcements.

  16. USDA Nematode Collection Database

    • agdatacommons.nal.usda.gov
    • catalog.data.gov
    • +2more
    bin
    Updated Nov 30, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Department of Agriculture, Agricultural Research Service (2023). USDA Nematode Collection Database [Dataset]. http://doi.org/10.15482/USDA.ADC/1326824
    Explore at:
    binAvailable download formats
    Dataset updated
    Nov 30, 2023
    Dataset provided by
    United States Department of Agriculturehttp://usda.gov/
    Agricultural Research Servicehttps://www.ars.usda.gov/
    Authors
    U.S. Department of Agriculture, Agricultural Research Service
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    The USDA Nematode Collection is one of the largest and most valuable nematode collections in existence. It contains over 49,000 permanent slides and vials, with a total repository of nematode specimens reaching several million, including Cobb-Steiner, Thorne, and other valuable collections. Nematodes contained in this collection originate from world-wide sources. The USDA Nematode Collection Database contains over 38,000 species entries. A broad range of data is stored for each specimen, including species, host, origin, collector, date collected and date received. All records are searchable and available to the public through the online database. The physical collection is housed at the USDA Nematology Laboratory in Beltsville, MD. Specimens are available for loan to scientists who cannot personally visit the collection. Please see the Policy for Loaning USDANC Specimens for more information on this process. Scientists and other workers are always welcomed and encouraged to deposit material into the collection. Resources in this dataset:Resource Title: USDA Nematode Collection Database. File Name: Web Page, url: https://nt.ars-grin.gov/nematodes/search.cfm The database portal for this collection

  17. P

    How to Log Into My PC Matic Account? Dataset

    • paperswithcode.com
    Updated Jun 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). How to Log Into My PC Matic Account? Dataset [Dataset]. https://paperswithcode.com/dataset/how-to-log-into-my-pc-matic-account
    Explore at:
    Dataset updated
    Jun 17, 2025
    Description

    To log in to your PC Matic account, please visit: 👉 https://login.maticpcaccount.com/

    Login to your PC Matic account

    PC Matic is a powerful antivirus and system optimization software designed to protect your devices from malware, ransomware, and performance issues. If you’ve already created an account, logging into your PC Matic account is simple. Follow the steps below to access your dashboard, manage your devices, or renew your subscription.

    (Toll Free) Number In today’s digital world, keeping your devices secure is more important than ever. PC Matic has become a trusted name in cybersecurity and performance optimization for both home and business users. But to make the most of its features, one essential step is knowing how to log into my PC Matic account effectively. (Toll Free) Number

    (Toll Free) Number

    Whether you're new to the platform or simply need a refresher, this comprehensive guide walks you through everything you need to know to log in, manage your dashboard, and keep your account secure.

    Why Logging Into Your PC Matic Account Matters Your PC Matic account is the control center for all your antivirus and optimization tools. When you PC Matic account login, you gain access to:

    Real-time threat monitoring

    (Toll Free) Number

    Scheduled and manual scans (Toll Free) Number

    Subscription and billing details (Toll Free) Number

    Device management across multiple platforms

    (Toll Free) Number Technical support and account settings

    (Toll Free) Number

    Simply put, if you're using PC Matic software, logging into your account is critical for staying protected and in control.

    Step-by-Step: How to Log Into My PC Matic Account Here’s a simple walkthrough to help you access your PC Matic account without confusion. (Toll Free) Number

    Open Your Web Browser Begin by launching your preferred internet browser—whether it's Google Chrome, Firefox, Safari, or Microsoft Edge.

    Navigate to the PC Matic Website In the address bar, type in “PC Matic” and select the official site from the search results. This is where you’ll find the login option, usually located at the top right corner of the homepage.

    Click on “Login” Once you're on the homepage, click on the Login or My Account button to begin the process of accessing your account.

    Enter Your Email and Password You’ll be prompted to enter your registered email address and the password associated with your account. Be sure to double-check for typos or case-sensitive characters.

    Click “Log In” After entering your credentials, click the Log In button. You’ll now be directed to your account dashboard, where you can manage your devices and settings.

    What to Do If You Can't Log In Sometimes, you may encounter issues when trying to log into my PC Matic account. Here’s how to troubleshoot:

    Forgot Password? Click on the Forgot Password link on the login page. You’ll be asked to enter your registered email address, and a reset link will be sent to you.

    Wrong Email? Make sure you're using the exact email address that you used during registration. Using a different email may prevent you from logging in.

    Account Locked? Too many failed login attempts can temporarily lock your account. Wait a few minutes and try again or reach out to PC Matic support for help.

    Keeping Your PC Matic Account Secure Security is a top priority for any account, especially when it’s tied to antivirus software. When you log into my PC Matic account, follow these best practices:

    Use a Strong Password: Combine uppercase letters, lowercase letters, numbers, and symbols for enhanced security.

    Enable Two-Factor Authentication (2FA): If available, use 2FA to add an extra layer of protection.

    Avoid Public Wi-Fi: Refrain from logging in on unsecured networks to prevent unauthorized access.

    Log Out on Shared Devices: Always sign out after use if you're on a public or shared computer.

    How to Stay Logged In If you're logging in from a secure, personal device and want to avoid entering your credentials every time, most browsers allow you to save login information. You can also check the “Remember Me” box on the login screen to stay signed in longer, depending on your account settings.

    Just remember: Only use this feature on private devices, not shared ones.

    Managing Your PC Matic Account After Login Once you log into my PC Matic account, your dashboard becomes your control hub. Here’s what you can do from there:

    View and Run Scans: Check scan history or initiate a new scan for malware or performance issues.

    Add or Remove Devices: Keep your PC Matic license in check by managing which devices are protected.

    Update Billing Info: Change your payment method, renew your subscription, or view invoices.

    Contact Support: Access the help center or submit support tickets directly from your account.

    Change Account Settings: Update your email, password, or notification preferences easily.

    Accessing PC Matic from Multiple Devices You don’t have to be on one device to manage your account. Whether you’re on a desktop, laptop, or tablet, you can PC Matic login account from any internet-connected device. Just use your credentials, and your dashboard will sync with all connected machines. (Toll Free) Number

    Conclusion Learning how to log into my PC Matic account is an essential step to unlocking the full potential of your antivirus and performance tools. With easy access to your dashboard, the ability to manage your devices, and options for enhancing security, logging in is your gateway to a safer, faster, and more efficient computing experience.

  18. Africa Crop Maize - Harvested Area (Mature Support)

    • africageoportal.com
    • rwanda.africageoportal.com
    • +4more
    Updated Nov 19, 2014
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Esri (2014). Africa Crop Maize - Harvested Area (Mature Support) [Dataset]. https://www.africageoportal.com/datasets/6fab7020446c43b0b44727d6cb134ae8
    Explore at:
    Dataset updated
    Nov 19, 2014
    Dataset authored and provided by
    Esrihttp://esri.com/
    Area covered
    Description

    Important Note: This item is in mature support as of April 2025 and will be retired in December 2026. New data is available for your use directly from the Authoritative Provider. Esri recommends accessing the data from the source provider as soon as possible as our service will not longer be available after December 2026. Maize (Zea mays), also known as corn, is a crop of world wide importance. Originally domesticated in what is now Mexico, its tolerance of diverse climates has lead to its widespread cultivation. Globally, it is tied with rice as the second most widely grown crop. Only wheat is more widely grown. In Africa it is grown throughout the agricultural regions of the continent from the Nile Delta in the north to the country of South Africa in the south. In sub-Saharan Africa it is relied on as a staple crop for 50% of the population. Dataset Summary This layer provides access to a5 arc-minute(approximately 10 km at the equator)cell-sized raster of the 1999-2001 annual average area ofmaize harvested in Africa. The data are in units of hectares/grid cell. TheSPAM 2000 v3.0.6 data used to create this layerwere produced by theInternational Food Policy Research Institutein 2012.This dataset was created by spatially disaggregating national and sub-national harvest datausing theSpatial Production Allocation Model. Link to source metadata For more information about this dataset and the importance of maize as a staple food see theHarvest Choice webpage. For data on other agricultural species in Africa see these layers:Cassava Groundnut (Peanut) Millet Potato Rice Sorghum Sweet Potato and Yam Wheat Data for important agricultural crops in South America are availablehere. What can you do with this layer? This layer is suitable for both visualization and analysis. It can be used in ArcGIS Online in web maps and applications and can be used in ArcGIS Desktop. This layer hasquery,identify, andexportimage services available. This layer is restricted to a maximum area of 24,000 x 24,000 pixelswhich allows access to the full dataset. The source data for this layer are availablehere. This layer is part of a larger collection oflandscape layersthat you can use to perform a wide variety of mapping and analysis tasks. TheLiving Atlas of the Worldprovides an easy way to explore the landscape layers and many otherbeautiful and authoritative maps on hundreds of topics. Geonetis a good resource for learning more aboutlandscape layers and the Living Atlas of the World. To get started follow these links: Landscape Layers - a reintroductionLiving Atlas Discussion Group

  19. Deep Water Fisheries Catch - Sea Around Us

    • niue-data.sprep.org
    • nauru-data.sprep.org
    • +13more
    zip
    Updated Feb 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Secretariat of the Pacific Regional Environment Programme (2025). Deep Water Fisheries Catch - Sea Around Us [Dataset]. https://niue-data.sprep.org/dataset/deep-water-fisheries-catch-sea-around-us
    Explore at:
    zip(7560884), zip(2277194), zip(3416488), zip(2623755), zip(2585748), zip(2082951), zip(3366431), zip(2275911), zip(3360309), zip(2459620), zip(2705197), zip(2315699), zip(2484475), zip(2597447), zip(2327685), zip(1947413), zip(2520353), zip(2391700), zip(3021516), zip(2414876), zip(2390899), zip(3316429)Available download formats
    Dataset updated
    Feb 20, 2025
    Dataset provided by
    Pacific Regional Environment Programmehttps://www.sprep.org/
    License

    Public Domain Mark 1.0https://creativecommons.org/publicdomain/mark/1.0/
    License information was derived automatically

    Area covered
    Pacific Region, 117.14721679688 50.625073063414, 289.41284179688 -53.85252660045)), POLYGON ((117.14721679688 -53.85252660045, 289.41284179688 50.625073063414
    Description

    The Sea Around Us is a research initiative at The University of British Columbia (located at the Institute for the Oceans and Fisheries, formerly Fisheries Centre) that assesses the impact of fisheries on the marine ecosystems of the world, and offers mitigating solutions to a range of stakeholders.

    The Sea Around Us was initiated in collaboration with The Pew Charitable Trusts in 1999, and in 2014, the Sea Around Us also began a collaboration with The Paul G. Allen Family Foundation to provide African and Asian countries with more accurate and comprehensive fisheries data.

    The Sea Around Us provides data and analyses through View Data, articles in peer-reviewed journals, and other media (News). The Sea Around Us regularly update products at the scale of countries’ Exclusive Economic Zones, Large Marine Ecosystems, the High Seas and other spatial scales, and as global maps and summaries.

    The Sea Around Us emphasizes catch time series starting in 1950, and related series (e.g., landed value and catch by flag state, fishing sector and catch type), and fisheries-related information on every maritime country (e.g., government subsidies, marine biodiversity). Information is also offered on sub-projects, e.g., the historic expansion of fisheries, the performance of Regional Fisheries Management Organizations, or the likely impact of climate change on fisheries.

    The information and data presented on their website is freely available to any user, granted that its source is acknowledged. The Sea Around Us is aware that this information may be incomplete. Please let them know about this via the feedback options available on this website.

    If you cite or display any content from the Site, or reference the Sea Around Us, the Sea Around Us – Indian Ocean, the University of British Columbia or the University of Western Australia, in any format, written or otherwise, including print or web publications, presentations, grant applications, websites, other online applications such as blogs, or other works, you must provide appropriate acknowledgement using a citation consistent with the following standard:

    When referring to various datasets downloaded from the website, and/or its concept or design, or to several datasets extracted from its underlying databases, cite its architects. Example: Pauly D., Zeller D., Palomares M.L.D. (Editors), 2020. Sea Around Us Concepts, Design and Data (seaaroundus.org).

    When referring to a set of values extracted for a given country, EEZ or territory, cite the most recent catch reconstruction report or paper (available on the website) for that country, EEZ or territory. Example: For the Mexican Pacific EEZ, the citation should be “Cisneros-Montemayor AM, Cisneros-Mata MA, Harper S and Pauly D (2015) Unreported marine fisheries catch in Mexico, 1950-2010. Fisheries Centre Working Paper #2015-22, University of British Columbia, Vancouver. 9 p.”, which is accessible on the EEZ page for Mexico (Pacific) on seaaroundus.org.

    To help us track the use of Sea Around Us data, we would appreciate you also citing Pauly, Zeller, and Palomares (2020) as the source of the information in an appropriate part of your text;

    When using data from our website that are not part of a typical catch reconstruction (e.g., catches by LME or other spatial entity, subsidies given to fisheries, the estuaries in a given country, or the surface area of a given EEZ), cite both the website and the study that generated the underlying database. Many of these can be derived from the ’methods’ texts associated with data pages on seaaroundus.org. Example: Sumaila et al. (2010) for subsides, Alder (2003) for estuaries and Claus et al. (2014) for EEZ delineations, respectively.

    The Sea Around Us data are (where not otherwise regulated) under a Creative Commons Attribution Non-Commercial 4.0 International License (https://creativecommons.org/licenses/by-nc/4.0/). Notices regarding copyrights (© The University of British Columbia), license and disclaimer can be found under http://www.seaaroundus.org/terms-and-conditions/. References:

    Alder J (2003) Putting the coast in the Sea Around Us Project. The Sea Around Us Newsletter (15): 1-2.

    Cisneros-Montemayor AM, Cisneros-Mata MA, Harper S and Pauly D (2015) Unreported marine fisheries catch in Mexico, 1950-2010. Fisheries Centre Working Paper #2015-22, University of British Columbia, Vancouver. 9 p.

    Pauly D, Zeller D, and Palomares M.L.D. (Editors) (2020) Sea Around Us Concepts, Design and Data (www.seaaroundus.org)

    Claus S, De Hauwere N, Vanhoorne B, Deckers P, Souza Dias F, Hernandez F and Mees J (2014) Marine Regions: Towards a global standard for georeferenced marine names and boundaries. Marine Geodesy 37(2): 99-125.

    Sumaila UR, Khan A, Dyck A, Watson R, Munro R, Tydemers P and Pauly D (2010) A bottom-up re-estimation of global fisheries subsidies. Journal of Bioeconomics 12: 201-225.

  20. Octo Browser: Your Ultimate Web Browsing Solution

    • kaggle.com
    Updated Apr 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    tahir tabassum (2025). Octo Browser: Your Ultimate Web Browsing Solution [Dataset]. https://www.kaggle.com/datasets/tahirtabassum/octo-browser-your-ultimate-web-browsing-solution/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 15, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    tahir tabassum
    Description

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F26037753%2Fbbb0cf1bf56d21eefd8affbdd3ba1230%2FzZeZUKTdB2N2MNkxptRRIagxm1Yewj1oTZ61NwpI.png?generation=1744691255224374&alt=media" alt="">

    Octo Browser: Your Ultimate Web Browsing Solution
    In today's digital world, having a good web browser is key. The Octo Browser is here to help. It offers a top-notch browsing experience unlike any other.

    This browser has cool features and is easy to use. It's perfect for anyone, whether you're just browsing or need it for work. It's made to make your online time better.
    The Octo Browser uses the latest tech. It loads pages quickly, keeps you safe, and is easy to get around. It's the best choice for anyone looking for a great browser.

    Key Takeaways
    - Advanced features for a seamless browsing experience
    - Robust security to protect your online activities
    - Fast page loading and intuitive navigation
    - User-centric design for enhanced usability
    - Ideal for both casual users and professionals

    Introducing Octo Browser
    Octo Browser is changing how we browse the web. It's a top-notch web browser that makes browsing fast. It's perfect for those who want quick and reliable results.
    Octo Browser has cool features that make browsing better. It's easy to use and works great.

    Key Features at a Glance
    Octo Browser has some key features:
    - High-speed page loading
    - Advanced security protocols
    - Intuitive interface design

    These features make browsing smooth and safe. Experts say it's a game-changer:
    "Octo Browser's blend of speed and security sets a new standard in the world of web browsers."

    How Octo Browser Stands Out
    Octo Browser is different because it focuses on speed and security. It offers a better browsing experience than others. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F26037753%2F0826c351a3a6febc05c9d0ee82f37595%2F51a667a1-7803-4b62-8b0e-f9dedc9394fc_1440x900.png?generation=1744691337624920&alt=media" alt=""> Blazing-Fast Performance
    Octo Browser brings you the future of web browsing with its lightning-fast speed. It uses a top-notch rendering engine and smart resource management.

    Optimized Rendering Engine
    Octo Browser has an optimized rendering engine that makes pages load much faster. This means you can quickly move through your favorite websites.

    Efficient Resource Management
    The browser's efficient resource management makes sure your system runs smoothly. It prevents slowdowns and crashes. Key features include:
    - Intelligent memory allocation
    - Background process optimization
    - Prioritization of active tabs

    Speed Comparison with Leading Browsers
    Octo Browser is the fastest among leading browsers. Here's why:
    - Loads pages up to 30% faster than the average browser
    - Maintains speed even with multiple tabs open
    - Outperforms competitors in both JavaScript and page rendering tests

    Uncompromising Security and Privacy
    In today's digital world, security and privacy are key. Octo Browser is built with these in mind. It's a secure browser that protects your data from cyber threats.
    Octo Browser is all about keeping your online activities safe. It has strong features to do just that.

    Built-in Privacy Protection
    Octo Browser has privacy features to keep your browsing private. It stops tracking and profiling, so your habits stay hidden.
    It uses advanced anti-tracking tech. This blocks third-party cookies and other tracking tools.

    Advanced Data Encryption
    Data encryption is vital for online safety. Octo Browser uses advanced encryption protocols to secure your data.
    This means your data is safe from unauthorized access. It's protected when you send or store it.

    Automatic Security Updates
    Octo Browser also has automatic security updates. This keeps your browser current with the latest security fixes.
    This way, you're always safe from new threats. You don't have to manually update the browser.

    Seamless User Experience
    Octo Browser is designed with the user in mind. It offers a seamless user experience. This means users can easily explore their favorite websites.

    Intuitive Interface Design
    The Octo Browser has an intuitive interface design. It's easy to use and navigate. The layout is clean and simple, focusing on your browsing experience.

    Extensive Customization Options
    Octo Browser gives you extensive customization options. You can personalize your browsing experience. Choose from various themes, customize toolbar layouts, and more.
    - Choose from multiple theme options
    - Customize toolbar layouts
    - Personalize your browsing experience

    Cross-Device Synchronization
    Octo Browser's cross-device synchronization lets you access your data on different devices. This means you ...

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Lorena Casanova Lozano; Sergio Costa Planells; Lorena Casanova Lozano; Sergio Costa Planells (2020). Best Books Ever Dataset [Dataset]. http://doi.org/10.5281/zenodo.4265096
Organization logo

Best Books Ever Dataset

Explore at:
3 scholarly articles cite this dataset (View in Google Scholar)
csvAvailable download formats
Dataset updated
Nov 10, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Lorena Casanova Lozano; Sergio Costa Planells; Lorena Casanova Lozano; Sergio Costa Planells
License

Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically

Description

The dataset has been collected in the frame of the Prac1 of the subject Tipology and Data Life Cycle of the Master's Degree in Data Science of the Universitat Oberta de Catalunya (UOC).

The dataset contains 25 variables and 52478 records corresponding to books on the GoodReads Best Books Ever list (the larges list on the site).

Original code used to retrieve the dataset can be found on github repository: github.com/scostap/goodreads_bbe_dataset

The data was retrieved in two sets, the first 30000 books and then the remainig 22478. Dates were not parsed and reformated on the second chunk so publishDate and firstPublishDate are representet in a mm/dd/yyyy format for the first 30000 records and Month Day Year for the rest.

Book cover images can be optionally downloaded from the url in the 'coverImg' field. Python code for doing so and an example can be found on the github repo.

The 25 fields of the dataset are:

| Attributes | Definition | Completeness |
| ------------- | ------------- | ------------- | 
| bookId | Book Identifier as in goodreads.com | 100 |
| title | Book title | 100 |
| series | Series Name | 45 |
| author | Book's Author | 100 |
| rating | Global goodreads rating | 100 |
| description | Book's description | 97 |
| language | Book's language | 93 |
| isbn | Book's ISBN | 92 |
| genres | Book's genres | 91 |
| characters | Main characters | 26 |
| bookFormat | Type of binding | 97 |
| edition | Type of edition (ex. Anniversary Edition) | 9 |
| pages | Number of pages | 96 |
| publisher | Editorial | 93 |
| publishDate | publication date | 98 |
| firstPublishDate | Publication date of first edition | 59 |
| awards | List of awards | 20 |
| numRatings | Number of total ratings | 100 |
| ratingsByStars | Number of ratings by stars | 97 |
| likedPercent | Derived field, percent of ratings over 2 starts (as in GoodReads) | 99 |
| setting | Story setting | 22 |
| coverImg | URL to cover image | 99 |
| bbeScore | Score in Best Books Ever list | 100 |
| bbeVotes | Number of votes in Best Books Ever list | 100 |
| price | Book's price (extracted from Iberlibro) | 73 |

Search
Clear search
Close search
Google apps
Main menu