100+ datasets found
  1. n

    (Dataset) The most visited health websites in the world

    • narcis.nl
    • data.mendeley.com
    Updated Jan 11, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Acosta-Vargas, P (via Mendeley Data) (2021). (Dataset) The most visited health websites in the world [Dataset]. http://doi.org/10.17632/n468trh5my.1
    Explore at:
    Dataset updated
    Jan 11, 2021
    Dataset provided by
    Data Archiving and Networked Services (DANS)
    Authors
    Acosta-Vargas, P (via Mendeley Data)
    Description

    Evaluation of the most visited health websites in the world

  2. Traces captured by visiting the top 1500 website

    • kaggle.com
    zip
    Updated Aug 25, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DNS_dataset (2021). Traces captured by visiting the top 1500 website [Dataset]. https://www.kaggle.com/jacksontang16/traces-captured-by-visiting-the-top-1500-website
    Explore at:
    zip(5852806 bytes)Available download formats
    Dataset updated
    Aug 25, 2021
    Authors
    DNS_dataset
    Description

    Dataset

    This dataset was created by DNS_dataset

    Contents

  3. D

    Public Dataset Access and Usage

    • data.sfgov.org
    • s.cnmilf.com
    • +2more
    application/rdfxml +5
    Updated Jul 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Public Dataset Access and Usage [Dataset]. https://data.sfgov.org/City-Infrastructure/Public-Dataset-Access-and-Usage/su99-qvi4
    Explore at:
    csv, application/rssxml, json, tsv, application/rdfxml, xmlAvailable download formats
    Dataset updated
    Jul 9, 2025
    Description

    A. SUMMARY This dataset is used to report on public dataset access and usage within the open data portal. Each row sums the amount of users who access a dataset each day, grouped by access type (API Read, Download, Page View, etc).

    B. HOW THE DATASET IS CREATED This dataset is created by joining two internal analytics datasets generated by the SF Open Data Portal. We remove non-public information during the process.

    C. UPDATE PROCESS This dataset is scheduled to update every 7 days via ETL.

    D. HOW TO USE THIS DATASET This dataset can help you identify stale datasets, highlight the most popular datasets and calculate other metrics around the performance and usage in the open data portal.

    Please note a special call-out for two fields: - "derived": This field shows if an asset is an original source (derived = "False") or if it is made from another asset though filtering (derived = "True"). Essentially, if it is derived from another source or not. - "provenance": This field shows if an asset is "official" (created by someone in the city of San Francisco) or "community" (created by a member of the community, not official). All community assets are derived as members of the community cannot add data to the open data portal.

  4. Most visited websites by hierachycal categories

    • kaggle.com
    Updated Sep 18, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Natanael de Souza Figueiredo (2020). Most visited websites by hierachycal categories [Dataset]. https://www.kaggle.com/natanael127/most-visited-websites-by-hierachycal-categories/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 18, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Natanael de Souza Figueiredo
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    Alexa Internet was founded in April 1996 by Brewster Kahle and Bruce Gilliat. The company's name was chosen in homage to the Library of Alexandria of Ptolemaic Egypt, drawing a parallel between the largest repository of knowledge in the ancient world and the potential of the Internet to become a similar store of knowledge. (from Wikipedia)

    The categories list was going out by September, 17h, 2020. So I would like to save it. https://support.alexa.com/hc/en-us/articles/360051913314

    This dataset was elaborated by this python script (V2.0): https://github.com/natanael127/dump-alexa-ranking

    Content

    The sites are grouped in 17 macro categories and this tree ends having more than 360.000 nodes. Subjects are very organized and each of them has its own rank of most accessed domains. So, even the keys of a sub-dictionary may be a good small dataset to use.

    Acknowledgements

    Thank you my friend André (https://github.com/andrerclaudio) by helping me with tips of Google Colaboratory and computational power to get the data until our deadline.

    Inspiration

    Alexa ranking was inspired by Library of Alexandria. In the modern world, it may be a good start for AI know more about many, many subjects of the world.

  5. A

    ‘Popular Website Traffic Over Time ’ analyzed by Analyst-2

    • analyst-2.ai
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com), ‘Popular Website Traffic Over Time ’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-popular-website-traffic-over-time-62e4/62549059/?iid=003-357&v=presentation
    Explore at:
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Popular Website Traffic Over Time ’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/popular-website-traffice on 13 February 2022.

    --- Dataset description provided by original source is as follows ---

    About this dataset

    Background

    Have you every been in a conversation and the question comes up, who uses Bing? This question comes up occasionally because people wonder if these sites have any views. For this research study, we are going to be exploring popular website traffic for many popular websites.

    Methodology

    The data collected originates from SimilarWeb.com.

    Source

    For the analysis and study, go to The Concept Center

    This dataset was created by Chase Willden and contains around 0 samples along with 1/1/2017, Social Media, technical information and other features such as: - 12/1/2016 - 3/1/2017 - and more.

    How to use this dataset

    • Analyze 11/1/2016 in relation to 2/1/2017
    • Study the influence of 4/1/2017 on 1/1/2017
    • More datasets

    Acknowledgements

    If you use this dataset in your research, please credit Chase Willden

    Start A New Notebook!

    --- Original source retains full ownership of the source dataset ---

  6. h

    1k_Website_Screenshots_and_Metadata

    • huggingface.co
    Updated Apr 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Silatus (2023). 1k_Website_Screenshots_and_Metadata [Dataset]. https://huggingface.co/datasets/silatus/1k_Website_Screenshots_and_Metadata
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 13, 2023
    Dataset authored and provided by
    Silatus
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Dataset Card for 1000 Website Screenshots with Metadata

      Dataset Summary
    

    Silatus is sharing, for free, a segment of a dataset that we are using to train a generative AI model for text-to-mockup conversions. This dataset was collected in December 2022 and early January 2023, so it contains very recent data from 1,000 of the world's most popular websites. You can get our larger 10,000 website dataset for free at: https://silatus.com/datasets This dataset includes: High-res… See the full description on the dataset page: https://huggingface.co/datasets/silatus/1k_Website_Screenshots_and_Metadata.

  7. i

    Website Fingerprinting Dataset of Browsing Network Traffic for Desktop and...

    • ieee-dataport.org
    Updated Oct 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohamad Amar Irsyad Mohd Aminuddin (2024). Website Fingerprinting Dataset of Browsing Network Traffic for Desktop and Mobile Webpages [Dataset]. https://ieee-dataport.org/documents/website-fingerprinting-dataset-browsing-network-traffic-desktop-and-mobile-webpages
    Explore at:
    Dataset updated
    Oct 21, 2024
    Authors
    Mohamad Amar Irsyad Mohd Aminuddin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is a dataset of Tor cell file extracted from browsing simulation using Tor Browser. The simulations cover both desktop and mobile webpages. The data collection process was using WFP-Collector tool (https://github.com/irsyadpage/WFP-Collector). All the neccessary configuration to perform the simulation as detailed in the tool repository.The webpage URL is selected by using the first 100 website based on: https://dataforseo.com/free-seo-stats/top-1000-websites.Each webpage URL is visited 90 times for each deskop and mobile browsing mode.

  8. h

    UI-Elements-Detection-Dataset

    • huggingface.co
    Updated Nov 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yash Jain (2024). UI-Elements-Detection-Dataset [Dataset]. https://huggingface.co/datasets/YashJain/UI-Elements-Detection-Dataset
    Explore at:
    Dataset updated
    Nov 26, 2024
    Authors
    Yash Jain
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Web UI Elements Dataset

      Overview
    

    A comprehensive dataset of web user interface elements collected from the world's most visited websites. This dataset is specifically curated for training AI models to detect and classify UI components, enabling automated UI testing, accessibility analysis, and interface design studies.

      Key Features
    

    300+ popular websites sampled 15 essential UI element classes High-resolution screenshots (1920x1080) Rich accessibility metadata… See the full description on the dataset page: https://huggingface.co/datasets/YashJain/UI-Elements-Detection-Dataset.

  9. News Portal User Interactions by Globo.com

    • kaggle.com
    zip
    Updated Apr 16, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gabriel Moreira (2019). News Portal User Interactions by Globo.com [Dataset]. https://www.kaggle.com/gspmoreira/news-portal-user-interactions-by-globocom
    Explore at:
    zip(377105112 bytes)Available download formats
    Dataset updated
    Apr 16, 2019
    Authors
    Gabriel Moreira
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Context

    This large dataset with users interactions logs (page views) from a news portal was kindly provided by Globo.com, the most popular news portal in Brazil, for reproducibility of the experiments with CHAMELEON - a meta-architecture for contextual hybrid session-based news recommender systems. The source code was made available at GitHub.

    The first version (v1) (download) of this dataset was released for reproducibility of the experiments presented in the following paper:

    Gabriel de Souza Pereira Moreira, Felipe Ferreira, and Adilson Marques da Cunha. 2018. News Session-Based Recommendations using Deep Neural Networks. In 3rd Workshop on Deep Learning for Recommender Systems (DLRS 2018), October 6, 2018, Vancouver, BC, Canada. ACM, New York, NY, USA, 9 pages. https://doi.org/10.1145/3270323.3270328

    A second version (v2) (download) of this dataset was made available for reproducibility of the experiments presented in the following paper. Compared to the v1, the only differences are:

    • Included four additional user contextual attributes (click_os, click_country, click_region, click_referrer_type)
    • Removed repeated clicks (clicks in the same articles) within sessions. Those sessions with less than two clicks (minimum for the next-click prediction task) were removed

    Gabriel de Souza Pereira Moreira, Dietmar Jannach, and Adilson Marques da Cunha. 2019. Contextual Hybrid Session-based News Recommendation with Recurrent Neural Networks. arXiv preprint arXiv:1904.10367, 49 pages

    You are not allowed to use this dataset for commercial purposes, only with academic objectives (like education or research). If used for research, please cite the above papers.

    Content

    The dataset contains a sample of user interactions (page views) in G1 news portal from Oct. 1 to 16, 2017, including about 3 million clicks, distributed in more than 1 million sessions from 314,000 users who read more than 46,000 different news articles during that period.

    It is composed by three files/folders:

    • clicks.zip - Folder with CSV files (one per hour), containing user sessions interactions in the news portal.
    • articles_metadata.csv - CSV file with metadata information about all (364047) published articles
    • articles_embeddings.pickle Pickle (Python 3) of a NumPy matrix containing the Article Content Embeddings (250-dimensional vectors), trained upon articles' text and metadata by the CHAMELEON's ACR module (see paper for details) for 364047 published articles.
      P.s. The full text of news articles could not be provided due to license restrictions, but those embeddings can be used by Neural Networks to represent their content. See this paper for a t-SNE visualization of these embeddings, colored by category.

    Acknowledgements

    I would like to acknowledge Globo.com for providing this dataset for this research and for the academic community, in special to Felipe Ferreira for preparing the original dataset by Globo.com.

    Dataset banner photo by rawpixel on Unsplash

    Inspiration

    This dataset might be very useful if you want to implement and evaluate hybrid and contextual news recommender systems, using both user interactions and articles content and metadata to provide recommendations. You might also use it for analytics, trying to understand how users interactions in a news portal are distributed by user, by article, or by category, for example.

    If you are interested in a dataset of user interactions on articles with the full text provided, to experiment with some different text representations using NLP, you might want to take a look in this smaller dataset.

  10. Most popular database management systems worldwide 2024

    • statista.com
    Updated Jun 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Most popular database management systems worldwide 2024 [Dataset]. https://www.statista.com/statistics/809750/worldwide-popularity-ranking-database-management-systems/
    Explore at:
    Dataset updated
    Jun 30, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jun 2024
    Area covered
    Worldwide
    Description

    As of June 2024, the most popular database management system (DBMS) worldwide was Oracle, with a ranking score of *******; MySQL and Microsoft SQL server rounded out the top three. Although the database management industry contains some of the largest companies in the tech industry, such as Microsoft, Oracle and IBM, a number of free and open-source DBMSs such as PostgreSQL and MariaDB remain competitive. Database Management Systems As the name implies, DBMSs provide a platform through which developers can organize, update, and control large databases. Given the business world’s growing focus on big data and data analytics, knowledge of SQL programming languages has become an important asset for software developers around the world, and database management skills are seen as highly desirable. In addition to providing developers with the tools needed to operate databases, DBMS are also integral to the way that consumers access information through applications, which further illustrates the importance of the software.

  11. k

    Expenditure on Inbound Tourist Trips by Purpose of Visit

    • datasource.kapsarc.org
    Updated Jun 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Expenditure on Inbound Tourist Trips by Purpose of Visit [Dataset]. https://datasource.kapsarc.org/explore/dataset/expenditure-on-inbound-tourist-trips-by-purpose-of-visit/
    Explore at:
    Dataset updated
    Jun 29, 2025
    Description

    Explore detailed tourism expenditure data in Saudi Arabia, including total expenditure, visits to relatives and friends, holidays, shopping, business conferences, and more. Obtain valuable insights and statistics for SAMA Annual reports.

    Total Expenditure, Visits To Relatives And Friends, Annually, Holidays and Shopping, Other Purposes, Religious Purposes, Business and Conferences, Visitors, Expenditure, Toursim Statistics, SAMA Annual

    Saudi Arabia Follow data.kapsarc.org for timely data to advance energy economics research..Notes:Include data on overnight visitors only.

  12. Z

    Dataset used for HTTPS traffic classification using packet burst statistics

    • data.niaid.nih.gov
    • zenodo.org
    Updated Apr 11, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cejka Tomas (2022). Dataset used for HTTPS traffic classification using packet burst statistics [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4911550
    Explore at:
    Dataset updated
    Apr 11, 2022
    Dataset provided by
    Hynek Karel
    Tropkova Zdena
    Cejka Tomas
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We are publishing a dataset we created for the HTTPS traffic classification.

    Since the data were captured mainly in the real backbone network, we omitted IP addresses and ports. The datasets consist of calculated from bidirectional flows exported with flow probe Ipifixprobe. This exporter can export a sequence of packet lengths and times and a sequence of packet bursts and time. For more information, please visit ipfixprobe repository (Ipifixprobe).

    During our research, we divided HTTPS into five categories: L -- Live Video Streaming, P -- Video Player, M -- Music Player, U -- File Upload, D -- File Download, W -- Website, and other traffic.

    We have chosen the service representatives known for particular traffic types based on the Alexa Top 1M list and Moz's list of the most popular 500 websites for each category. We also used several popular websites that primarily focus on the audience in our country. The identified traffic classes and their representatives are provided below:

    Live Video Stream Twitch, Czech TV, YouTube Live

    Video Player DailyMotion, Stream.cz, Vimeo, YouTube

    Music Player AppleMusic, Spotify, SoundCloud

    File Upload/Download FileSender, OwnCloud, OneDrive, Google Drive

    Website and Other Traffic Websites from Alexa Top 1M list

  13. c

    Recipes dataset from allrecipes

    • crawlfeeds.com
    csv, zip
    Updated Jul 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crawl Feeds (2025). Recipes dataset from allrecipes [Dataset]. https://crawlfeeds.com/datasets/recipes-dataset-from-allrecipes
    Explore at:
    zip, csvAvailable download formats
    Dataset updated
    Jul 3, 2025
    Dataset authored and provided by
    Crawl Feeds
    License

    https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

    Description

    Unleash the culinary potential with our comprehensive Recipes dataset from Allrecipes. This dataset provides detailed information on a vast collection of recipes sourced from Allrecipes, one of the world's most popular recipe websites. Ideal for chefs, food enthusiasts, developers, and data scientists, this dataset offers an extensive range of culinary possibilities.

    The dataset includes key details such as recipe titles, ingredients, preparation instructions, cooking times, user ratings, and dietary categories. With recipes spanning various cuisines, dietary preferences, and meal types, this dataset is a valuable resource for creating recipe apps, conducting nutritional analysis, or exploring new culinary trends.

    Looking for more data to fuel your food-related projects? Check out our Food & Beverage Data for diverse datasets designed to inspire and empower innovation in the food and beverage industry.

    Enhance your food-related projects with structured, high-quality data from Allrecipes. Whether developing a recipe recommendation engine, building a food blog, or researching cooking trends, this dataset is your go-to resource for delicious inspiration and data-driven culinary insights.

  14. Z

    Dataset used for detecting DNS over HTTPS by Machine Learning.

    • data.niaid.nih.gov
    • zenodo.org
    Updated Oct 28, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vekshin,Dmitrii (2020). Dataset used for detecting DNS over HTTPS by Machine Learning. [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3818004
    Explore at:
    Dataset updated
    Oct 28, 2020
    Dataset provided by
    Vekshin,Dmitrii
    Cejka,Tomas
    Hynek,Karel
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The dataset consists of three different data sources:

    DoH enabled Firefox

    DoH enabled Google Chrome

    Cloudflared DoH proxy

    The capture of web browser data was made using the Selenium framework, which simulated classical user browsing. The browsers received command for visiting domains taken from Alexa's top 10K most visited websites. The capturing was performed on the host by listening to the network interface of the virtual machine. Overall the dataset contains almost 5,000 web-page visits by Mozilla and 1,000 pages visited by Chrome.

    The Cloudflared DoH proxy was installed in Raspberry PI, and the IP address of the Raspberry was set as the default DNS resolver in two separate offices in our university. It was continuously capturing the DNS/DoH traffic created up to 20 devices for around three months.

    The dataset contains 1,128,904 flows from which is around 33,000 labeled as DoH. We provide raw pcap data, CSV with flow data, and CSV file with extracted features.

    The CSV with extracted features has the following data fields:

    • Label (1 - Doh, 0 - regular HTTPS)
    • Data source
    • Duration
    • Minimal Inter-Packet Delay
    • Maximal Inter-Packet Delay
    • Average Inter-Packet Delay
    • A variance of Incoming Packet Sizes
    • A variance of Outgoing Packet Sizes
    • A ratio of the number of Incoming and outgoing bytes
    • A ration of the number of Incoming and outgoing packets
    • Average of Incoming Packet sizes
    • Average of Outgoing Packet sizes
    • The median value of Incoming Packet sizes
    • The median value of outgoing Packet sizes
    • The ratio of bursts and pauses
    • Number of bursts
    • Number of pauses
    • Autocorrelation
    • Transmission symmetry in the 1st third of connection
    • Transmission symmetry in the 2nd third of connection
    • Transmission symmetry in the last third of connection

    The observed network traffic does not contain privacy-sensitive information.

    The zip file structure is:

    |-- data | |-- extracted-features...extracted features used in ML for DoH recognition | | |-- chrome | | |-- cloudflared | | -- firefox | |-- flows...............................................exported flow data | | |-- chrome | | |-- cloudflared | |-- firefox | -- pcaps....................................................raw PCAP data | |-- chrome | |-- cloudflared |-- firefox |-- LICENSE `-- README.md

    When using this dataset, please cite the original work as follows:

    @inproceedings{vekshin2020, author = {Vekshin, Dmitrii and Hynek, Karel and Cejka, Tomas}, title = {DoH Insight: Detecting DNS over HTTPS by Machine Learning}, year = {2020}, isbn = {9781450388337}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3407023.3409192}, doi = {10.1145/3407023.3409192}, booktitle = {Proceedings of the 15th International Conference on Availability, Reliability and Security}, articleno = {87}, numpages = {8}, keywords = {classification, DoH, DNS over HTTPS, machine learning, detection, datasets}, location = {Virtual Event, Ireland}, series = {ARES '20} }

  15. Most Popular Baby Names

    • data.chhs.ca.gov
    • data.ca.gov
    • +3more
    csv
    Updated Dec 30, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Department of Public Health (2024). Most Popular Baby Names [Dataset]. https://data.chhs.ca.gov/dataset/most-popular-baby-names-2005-current
    Explore at:
    csv(1219), csv(121160)Available download formats
    Dataset updated
    Dec 30, 2024
    Dataset authored and provided by
    California Department of Public Healthhttps://www.cdph.ca.gov/
    Description

    This dataset contains ranks and counts for the top 25 baby names by sex for live births that occurred in California (by occurrence) based on information entered on birth certificates.

  16. YouTube Datasets

    • brightdata.com
    .json, .csv, .xlsx
    Updated Jan 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bright Data (2023). YouTube Datasets [Dataset]. https://brightdata.com/products/datasets/youtube
    Explore at:
    .json, .csv, .xlsxAvailable download formats
    Dataset updated
    Jan 9, 2023
    Dataset authored and provided by
    Bright Datahttps://brightdata.com/
    License

    https://brightdata.com/licensehttps://brightdata.com/license

    Area covered
    YouTube, Worldwide
    Description

    Use our YouTube profiles dataset to extract both business and non-business information from public channels and filter by channel name, views, creation date, or subscribers. Datapoints include URL, handle, banner image, profile image, name, subscribers, description, video count, create date, views, details, and more. You may purchase the entire dataset or a customized subset, depending on your needs. Popular use cases for this dataset include sentiment analysis, brand monitoring, influencer marketing, and more.

  17. c

    Most popular websites in the Netherlands 2015

    • datacatalogue.cessda.eu
    • ssh.datastations.nl
    Updated Jul 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    M. Kleppe; H. Bijleveld (2023). Most popular websites in the Netherlands 2015 [Dataset]. http://doi.org/10.17026/dans-x6h-6qqt
    Explore at:
    Dataset updated
    Jul 4, 2023
    Dataset provided by
    Vrije Universiteit Amsterdam
    Authors
    M. Kleppe; H. Bijleveld
    Area covered
    Netherlands
    Description

    This dataset contains a list of 3654 Dutch websites that we considered the most popular websites in 2015. This list served as whitelist for the Newstracker Research project in which we monitored the online web behaviour of a group of respondents.

    The research project 'The Newstracker' was a subproject of the NWO-funded project 'The New News Consumer: A User-Based Innovation Project to Meet Paradigmatic Change in News Use and Media Habits'.

    For the Newstracker project we aimed to understand the web behaviour of a group of respondents. We created custom-built software to monitor their web browsing behaviour on their laptops and desktops (please find the code in open access at https://github.com/NITechLabs/NewsTracker). For reasons of scale and privacy we created a whitelist with websites that were the most popular websites in 2015. We manually compiled this list by using data of DDMM, Alexa and own research. The dataset consists of 5 columns:
    - the URL
    - the type of website: We created a list of types of websites and each website has been manually labeled with 1 category
    - Nieuws-regio: When the category was 'News', we subdivided these websites in the regional focus: International, National or Local
    - Nieuws-onderwerp: Furthermore, each website under the category News was further subdivided in type of news website. For this we created an own list of news categories and manually coded each website
    - Bron: For each website we noted which source we used to find this website.

    The full description of the research design of the Newstracker including the set-up of this whitelist is included in the following article: Kleppe, M., Otte, M. (in print), 'Analysing & understanding news consumption patterns by tracking online user behaviour with a multimodal research design', Digital Scholarship in the Humanities, doi 10.1093/llc/fqx030.

  18. o

    Global Movie Popularity Dataset

    • opendatabay.com
    .undefined
    Updated Jul 3, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Global Movie Popularity Dataset [Dataset]. https://www.opendatabay.com/data/dataset/c9597b23-d205-46ff-abb3-674815373730
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jul 3, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Entertainment & Media Consumption
    Description

    This dataset provides details on the 10,000 most popular films globally, sourced from The Movie Database (TMDb) via its read API. TMDb is a crowd-sourced movie information database widely used by various film-related platforms and applications. The dataset is ideal for film-related analysis, building recommender systems, and natural language processing tasks, even for those new to data analysis, as it contains some missing values.

    Columns

    • index: An identifier for each record.
    • title: The name of the movie.
    • overview: A concise summary or synopsis of the movie.
    • original_language: The primary language in which the movie was filmed.
    • vote_count: The number of votes received for the movie, also indicated as the date of publish in some contexts.
    • vote_average: The average rating given to the movie by voters.
    • popularity: A metric indicating the popularity score of the movie.

    Distribution

    The dataset is provided in a CSV file format. It comprises approximately 10,000 individual movie records. While exact row and record counts are not specified, the dataset is structured as tabular data, with each row representing a unique movie entry and columns detailing various attributes.

    Usage

    This dataset is well-suited for a variety of applications, including: * Developing and enhancing film-related consoles, websites, and mobile applications. * Creating movie recommender systems. * Performing data visualisations related to film trends and popularity. * Conducting natural language processing (NLP) tasks on movie overviews. * Data analysis and exploration, particularly for those looking to practise handling missing data.

    Coverage

    The dataset covers movies from across the world, offering a global scope. While a specific time range for the movies is not explicitly stated, the data is fetched from TMDb, which updates its API periodically. It's noted that the dataset includes some null values where information was missing from the original TMDb database.

    License

    CCO

    Who Can Use It

    This dataset is intended for a broad audience including: * Young analysts: To practise data cleaning and analysis with datasets containing missing values. * Developers: For integrating movie information into media managers, mobile apps, and social sites. * Researchers: For studies on movie popularity, audience reception, and content analysis. * Data scientists: For building and testing machine learning models such as recommender systems and NLP models.

    Dataset Name Suggestions

    • TMDb Popular Movies
    • Global Movie Popularity Dataset
    • Top Movies from TMDb API
    • Movie Data for Film Analysis
    • TMDb Film Insights

    Attributes

    Original Data Source: Popular Movies of IMDb

  19. T

    United States Tourist Arrivals

    • tradingeconomics.com
    • fr.tradingeconomics.com
    • +13more
    csv, excel, json, xml
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRADING ECONOMICS, United States Tourist Arrivals [Dataset]. https://tradingeconomics.com/united-states/tourist-arrivals
    Explore at:
    json, csv, excel, xmlAvailable download formats
    Dataset authored and provided by
    TRADING ECONOMICS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 31, 1996 - Apr 30, 2025
    Area covered
    United States
    Description

    Tourist Arrivals in the United States increased to 5957985 in April from 5410331 in March of 2025. This dataset provides - United States Tourist Arrivals- actual values, historical data, forecast, chart, statistics, economic calendar and news.

  20. o

    Artisanal mining site visits in Eastern DRC - Dataset - openAFRICA

    • open.africa
    Updated Feb 7, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2019). Artisanal mining site visits in Eastern DRC - Dataset - openAFRICA [Dataset]. https://open.africa/dataset/artisanal-mining-site-visits-in-eastern-drc
    Explore at:
    Dataset updated
    Feb 7, 2019
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Democratic Republic of the Congo
    Description

    IPIS has collected data on artisanal mining sites since 2009, and made it publicly accessible on webmaps and in analytical reports. The upgraded map presents new mining sites, bringing the total to more than 2400 sites visited as recently as December 2017. New information on the mining sites has been included. A new layer has been added displaying hundreds of roadblocks. The latest update of the map has been supported by the International Organization for Migration (IOM) in the DRC, through the USAID funded Responsible Minerals Trade (RMT) project

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Acosta-Vargas, P (via Mendeley Data) (2021). (Dataset) The most visited health websites in the world [Dataset]. http://doi.org/10.17632/n468trh5my.1

(Dataset) The most visited health websites in the world

Explore at:
3 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Jan 11, 2021
Dataset provided by
Data Archiving and Networked Services (DANS)
Authors
Acosta-Vargas, P (via Mendeley Data)
Description

Evaluation of the most visited health websites in the world

Search
Clear search
Close search
Google apps
Main menu