60 datasets found

Top Visited Websites
kaggle.com
Updated Nov 19, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2022). Top Visited Websites [Dataset]. https://www.kaggle.com/datasets/thedevastator/the-top-websites-in-the-world/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 19, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
The Top Websites in the World

How They Change Over Time

About this dataset

This dataset consists of the top 50 most visited websites in the world, as well as the category and principal country/territory for each site. The data provides insights into which sites are most popular globally, and what type of content is most popular in different parts of the world

How to use the dataset

This dataset can be used to track the most popular websites in the world over time. It can also be used to compare website popularity between different countries and categories

Research Ideas

To track the most popular websites in the world over time

To see how website popularity changes by region

To find out which website categories are most popular

Acknowledgements

Dataset by Alexa Internet, Inc. (2019), released on Kaggle under the Open Data Commons Public Domain Dedication and License (ODC-PDDL)

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: df_1.csv | Column name | Description | |:--------------------------------|:---------------------------------------------------------------------| | Site | The name of the website. (String) | | Domain Name | The domain name of the website. (String) | | Category | The category of the website. (String) | | Principal country/territory | The principal country/territory where the website is based. (String) |
d
Custom dataset from any website on the Internet
datarade.ai
Updated Oct 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ScrapeLabs (2024). Custom dataset from any website on the Internet [Dataset]. https://datarade.ai/data-products/custom-dataset-from-any-website-on-the-internet-scrapelabs
Explore at:
.bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
Dataset updated
Oct 25, 2024
Dataset authored and provided by
ScrapeLabs
Area covered
India, Kazakhstan, Bulgaria, Tunisia, Jordan, Turks and Caicos Islands, Argentina, Guinea-Bissau, Lebanon, Aruba
Description
We'll extract any data from any website on the Internet. You don't have to worry about buying and maintaining complex and expensive software, or hiring developers.

Some common use cases our customers use the data for: • Data Analysis • Market Research • Price Monitoring • Sales Leads • Competitor Analysis • Recruitment

We can get data from websites with pagination or scroll, with captchas, and even from behind logins. Text, images, videos, documents.

Receive data in any format you need: Excel, CSV, JSON, or any other.
🕵️ Phishing Websites Data
kaggle.com
Updated Feb 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sairaj Adhav (2025). 🕵️ Phishing Websites Data [Dataset]. https://www.kaggle.com/datasets/sai10py/phishing-websites-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 24, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sairaj Adhav
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Phishing Websites Dataset

Overview

This dataset is designed to aid in the analysis and detection of phishing websites. It contains various features that help distinguish between legitimate and phishing websites based on their structural, security, and behavioral attributes.

Dataset Information

Total Columns: 31 (30 Features + 1 Target)

Target Variable: Result (Indicates whether a website is phishing or legitimate)

Features Description

URL-Based Features

Prefix_Suffix – Checks if the URL contains a hyphen (-), which is commonly used in phishing domains.

double_slash_redirecting – Detects if the URL redirects using //, which may indicate a phishing attempt.

having_At_Symbol – Identifies the presence of @ in the URL, which can be used to deceive users.

Shortining_Service – Indicates whether the URL uses a shortening service (e.g., bit.ly, tinyurl).

URL_Length – Measures the length of the URL; phishing URLs tend to be longer.

having_IP_Address – Checks if an IP address is used in place of a domain name, which is suspicious.

Domain-Based Features

having_Sub_Domain – Evaluates the number of subdomains; phishing sites often have excessive subdomains.

SSLfinal_State – Indicates whether the website has a valid SSL certificate (secure connection).

Domain_registeration_length – Measures the duration of domain registration; phishing sites often have short lifespans.

age_of_domain – The age of the domain in days; older domains are usually more trustworthy.

DNSRecord – Checks if the domain has valid DNS records; phishing domains may lack these.

Webpage-Based Features

Favicon – Determines if the website uses an external favicon (which can be a sign of phishing).

port – Identifies if the site is using suspicious or non-standard ports.

HTTPS_token – Checks if "HTTPS" is included in the URL but is used deceptively.

Request_URL – Measures the percentage of external resources loaded from different domains.

URL_of_Anchor – Analyzes anchor tags (<a> links) and their trustworthiness.

Links_in_tags – Examines <meta>, <script>, and <link> tags for external links.

SFH (Server Form Handler) – Determines if form actions are handled suspiciously.

Submitting_to_email – Checks if forms submit data directly to an email instead of a web server.

Abnormal_URL – Identifies if the website’s URL structure is inconsistent with common patterns.

Redirect – Counts the number of redirects; phishing websites may have excessive redirects.

Behavior-Based Features

on_mouseover – Checks if the website changes content when hovered over (used in deceptive techniques).

RightClick – Detects if right-click functionality is disabled (phishing sites may disable it).

popUpWindow – Identifies the presence of pop-ups, which can be used to trick users.

Iframe – Checks if the website uses <iframe> tags, often used in phishing attacks.

Traffic & Search Engine Features

web_traffic – Measures the website’s Alexa ranking; phishing sites tend to have low traffic.

Page_Rank – Google PageRank score; phishing sites usually have a low PageRank.

Google_Index – Checks if the website is indexed by Google (phishing sites may not be indexed).

Links_pointing_to_page – Counts the number of backlinks pointing to the website.

Statistical_report – Uses external sources to verify if the website has been reported for phishing.

Target Variable

Result – The classification label (1: Legitimate, -1: Phishing)

Usage

This dataset is valuable for:
✅ Machine Learning Models – Developing classifiers for phishing detection.
✅ Cybersecurity Research – Understanding patterns in phishing attacks.
✅ Browser Security Extensions – Enhancing anti-phishing tools.
Number of internet users worldwide 2014-2029
statista.com
Updated Apr 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista Research Department (2025). Number of internet users worldwide 2014-2029 [Dataset]. https://www.statista.com/topics/1145/internet-usage-worldwide/
Explore at:
Dataset updated
Apr 11, 2025
Dataset provided by
Statistahttp://statista.com/
Authors
Statista Research Department
Area covered
World
Description
The global number of internet users in was forecast to continuously increase between 2024 and 2029 by in total 1.3 billion users (+23.66 percent). After the fifteenth consecutive increasing year, the number of users is estimated to reach 7 billion users and therefore a new peak in 2029. Notably, the number of internet users of was continuously increasing over the past years.Depicted is the estimated number of individuals in the country or region at hand, that use the internet. As the datasource clarifies, connection quality and usage frequency are distinct aspects, not taken into account here.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of internet users in countries like the Americas and Asia.
Most visited websites by hierachycal categories
kaggle.com
Updated Sep 18, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Natanael de Souza Figueiredo (2020). Most visited websites by hierachycal categories [Dataset]. https://www.kaggle.com/natanael127/most-visited-websites-by-hierachycal-categories/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 18, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Natanael de Souza Figueiredo
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

Alexa Internet was founded in April 1996 by Brewster Kahle and Bruce Gilliat. The company's name was chosen in homage to the Library of Alexandria of Ptolemaic Egypt, drawing a parallel between the largest repository of knowledge in the ancient world and the potential of the Internet to become a similar store of knowledge. (from Wikipedia)

The categories list was going out by September, 17h, 2020. So I would like to save it. https://support.alexa.com/hc/en-us/articles/360051913314

This dataset was elaborated by this python script (V2.0): https://github.com/natanael127/dump-alexa-ranking

Content

The sites are grouped in 17 macro categories and this tree ends having more than 360.000 nodes. Subjects are very organized and each of them has its own rank of most accessed domains. So, even the keys of a sub-dictionary may be a good small dataset to use.

Acknowledgements

Thank you my friend André (https://github.com/andrerclaudio) by helping me with tips of Google Colaboratory and computational power to get the data until our deadline.

Inspiration

Alexa ranking was inspired by Library of Alexandria. In the modern world, it may be a good start for AI know more about many, many subjects of the world.
Attitudes towards the internet in Mexico 2024
statista.com
Updated Apr 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Umair Bashir (2025). Attitudes towards the internet in Mexico 2024 [Dataset]. https://www.statista.com/topics/1145/internet-usage-worldwide/
Explore at:
Dataset updated
Apr 11, 2025
Dataset provided by
Statistahttp://statista.com/
Authors
Umair Bashir
Description
When asked about "Attitudes towards the internet", most Mexican respondents pick "It is important to me to have mobile internet access in any place at any time" as an answer. 55 percent did so in our online survey in 2024. Looking to gain valuable insights about users of internet providers worldwide? Check out our
Crunchyroll Meta-Data
kaggle.com
Updated Aug 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BIT_Guber (2023). Crunchyroll Meta-Data [Dataset]. https://www.kaggle.com/datasets/bitguber/crunchyroll-meta-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 15, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
BIT_Guber
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This is just prepared data from crunchyroll web scraped data using code line here I extracted meta-data from crunchyroll websites.

Before please visit Crunchyroll

Dataset contains 7 files

popular.csv

Each row represented a series in popular page. note: some information not updated ( I guess Crunchyroll not update is Popular table in Database )

series.csv

It's also have similar feature as popular.csv but updated data points.

seasons.csv

Each row represented a season from it's corresponding series.

episodes.csv

Information about individual episodes from it's corresponding series.

series_music.csv

Some series have featured music collection.

audio.json

Mapping full representation of audio version of episode dubbed.

categories.json

Mapping each categories of series ,it defined by crunchyroll.
Attitudes towards the internet in China 2024
statista.com
Updated Apr 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Umair Bashir (2025). Attitudes towards the internet in China 2024 [Dataset]. https://www.statista.com/topics/1145/internet-usage-worldwide/
Explore at:
Dataset updated
Apr 11, 2025
Dataset provided by
Statistahttp://statista.com/
Authors
Umair Bashir
Description
When asked about "Attitudes towards the internet", most Chinese respondents pick "It is important to me to have mobile internet access in any place at any time" as an answer. 49 percent did so in our online survey in 2024. Looking to gain valuable insights about users of internet providers worldwide? Check out our
G
Adverse effects of using the Internet and social networking websites or apps...
open.canada.ca
www150.statcan.gc.ca
+2more
csv, html, xml
Updated Jan 17, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statistics Canada (2023). Adverse effects of using the Internet and social networking websites or apps by gender and age group, inactive [Dataset]. https://open.canada.ca/data/en/dataset/80c88ac9-8ea1-4ff7-856e-560f7683d660
Explore at:
html, xml, csvAvailable download formats
Dataset updated
Jan 17, 2023
Dataset provided by
Statistics Canada
License
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Description
Percentage of Internet users who have experienced selected personal effects in their life because of the Internet and the use of social networking websites or apps, during the past 12 months.
h
Mind2Web
huggingface.co
Updated Jun 12, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OSU NLP Group (2023). Mind2Web [Dataset]. https://huggingface.co/datasets/osunlp/Mind2Web
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 12, 2023
Dataset authored and provided by
OSU NLP Group
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Card for Dataset Name

Dataset Summary

Mind2Web is a dataset for developing and evaluating generalist agents for the web that can follow language instructions to complete complex tasks on any website. Existing datasets for web agents either use simulated websites or only cover a limited set of websites and tasks, thus not suitable for generalist web agents. With over 2,000 open-ended tasks collected from 137 websites spanning 31 domains and crowdsourced action… See the full description on the dataset page: https://huggingface.co/datasets/osunlp/Mind2Web.
NYC STEW-MAP Staten Island organizations' website hyperlink webscrape
catalog.data.gov
s.cnmilf.com
Updated Nov 21, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2022). NYC STEW-MAP Staten Island organizations' website hyperlink webscrape [Dataset]. https://catalog.data.gov/dataset/nyc-stew-map-staten-island-organizations-website-hyperlink-webscrape
Explore at:
Dataset updated
Nov 21, 2022
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Area covered
Staten Island, New York
Description
The data represent web-scraping of hyperlinks from a selection of environmental stewardship organizations that were identified in the 2017 NYC Stewardship Mapping and Assessment Project (STEW-MAP) (USDA 2017). There are two data sets: 1) the original scrape containing all hyperlinks within the websites and associated attribute values (see "README" file); 2) a cleaned and reduced dataset formatted for network analysis. For dataset 1: Organizations were selected from from the 2017 NYC Stewardship Mapping and Assessment Project (STEW-MAP) (USDA 2017), a publicly available, spatial data set about environmental stewardship organizations working in New York City, USA (N = 719). To create a smaller and more manageable sample to analyze, all organizations that intersected (i.e., worked entirely within or overlapped) the NYC borough of Staten Island were selected for a geographically bounded sample. Only organizations with working websites and that the web scraper could access were retained for the study (n = 78). The websites were scraped between 09 and 17 June 2020 to a maximum search depth of ten using the snaWeb package (version 1.0.1, Stockton 2020) in the R computational language environment (R Core Team 2020). For dataset 2: The complete scrape results were cleaned, reduced, and formatted as a standard edge-array (node1, node2, edge attribute) for network analysis. See "READ ME" file for further details. References: R Core Team. (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/. Version 4.0.3. Stockton, T. (2020). snaWeb Package: An R package for finding and building social networks for a website, version 1.0.1. USDA Forest Service. (2017). Stewardship Mapping and Assessment Project (STEW-MAP). New York City Data Set. Available online at https://www.nrs.fs.fed.us/STEW-MAP/data/. This dataset is associated with the following publication: Sayles, J., R. Furey, and M. Ten Brink. How deep to dig: effects of web-scraping search depth on hyperlink network analysis of environmental stewardship organizations. Applied Network Science. Springer Nature, New York, NY, 7: 36, (2022).
G
Selected social outcomes of using the Internet and social networking...
open.canada.ca
www150.statcan.gc.ca
+1more
csv, html, xml
Updated Jan 17, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statistics Canada (2023). Selected social outcomes of using the Internet and social networking websites or apps by gender and age group [Dataset]. https://open.canada.ca/data/en/dataset/971e1d31-a88f-41f6-a68d-1e1f236da491
Explore at:
csv, html, xmlAvailable download formats
Dataset updated
Jan 17, 2023
Dataset provided by
Statistics Canada
License
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Description
Percentage of Canadians who have experienced selected personal effects in their life because of the Internet and the use of social networking websites or apps, during the past 12 months.
Attitudes towards the internet in Japan 2024
statista.com
Updated Apr 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Umair Bashir (2025). Attitudes towards the internet in Japan 2024 [Dataset]. https://www.statista.com/topics/1145/internet-usage-worldwide/
Explore at:
Dataset updated
Apr 11, 2025
Dataset provided by
Statistahttp://statista.com/
Authors
Umair Bashir
Description
When asked about "Attitudes towards the internet", most Japanese respondents pick "I could no longer imagine my everyday life without the internet" as an answer. 56 percent did so in our online survey in 2024. Looking to gain valuable insights about users of internet providers worldwide? Check out our
d
Forward DNS (A, MX, NS, CNAME and TXT records)
datarade.ai
.json, .csv
Updated Nov 29, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Netlas.io (2021). Forward DNS (A, MX, NS, CNAME and TXT records) [Dataset]. https://datarade.ai/data-products/whole-dns-registry-a-mx-ns-cname-and-txt-records-netlas
Explore at:
.json, .csvAvailable download formats
Dataset updated
Nov 29, 2021
Dataset provided by
Netlas.io
Area covered
New Caledonia, Nauru, Djibouti, India, Åland Islands, Saint Helena, Mauritius, Seychelles, Falkland Islands (Malvinas), Ecuador
Description
Netlas.io is a set of internet intelligence apps that provide accurate technical information on IP addresses, domain names, websites, web applications, IoT devices, and other online assets.

Netlas.io scans every IPv4 address and every known domain name utilizing such protocols as HTTP, FTP, SMTP, POP3, IMAP, SMB/CIFS, SSH, Telnet, SQL and others. Collected data is enriched with additional info and available in Netlas.io Search Engine. Some parts of Netlas.io database is available as downloadable datasets.

Netlas.io accumulates domain names to make internet scan coverage as wide as possible. Domain names are collected from ICANN Centralized Zone Data Service, SSL Certificates, 301 & 302 HTTP redirects (while scanning) and other sources.

This dataset contains domains and subdomains (all gTLD and ccTLD), that have at least one associated DNS registry entry (A, MX, NS, CNAME and TXT records).
P
WebUI Dataset
paperswithcode.com
Updated Mar 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). WebUI Dataset [Dataset]. https://paperswithcode.com/dataset/webui
Explore at:
Dataset updated
Mar 15, 2025
Description
The WebUI dataset contains 400K web UIs captured over a period of 3 months and cost about $500 to crawl. We grouped web pages together by their domain name, then generated training (70%), validation (10%), and testing (20%) splits. This ensured that similar pages from the same website must appear in the same split. We created four versions of the training dataset. Three of these splits were generated by randomly sampling a subset of the training split: Web-7k, Web-70k, Web-350k. We chose 70k as a baseline size, since it is approximately the size of existing UI datasets. We also generated an additional split (Web-7k-Resampled) to provide a small, higher quality split for experimentation. Web-7k-Resampled was generated using a class-balancing sampling technique, and we removed screens with possible visual defects (e.g., very small, occluded, or invisible elements). The validation and test split was always kept the same.
R
Web Page Object Detection Dataset
universe.roboflow.com
zip
Updated Mar 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
web page summarizer (2023). Web Page Object Detection Dataset [Dataset]. https://universe.roboflow.com/web-page-summarizer/web-page-object-detection
Explore at:
zipAvailable download formats
Dataset updated
Mar 2, 2023
Dataset authored and provided by
web page summarizer
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Web Page Elements Bounding Boxes
Description
Here are a few use cases for this project:

Web Accessibility Improvement: The "Web Page Object Detection" model can be used to identify and label various elements on a web page, making it easier for people with visual impairments to navigate and interact with websites using screen readers and other assistive technologies.

Web Design Analysis: The model can be employed to analyze the structure and layout of popular websites, helping web designers understand best practices and trends in web design. This information can inform the creation of new, user-friendly websites or redesigns of existing pages.

Automatic Web Page Summary Generation: By identifying and extracting key elements, such as titles, headings, content blocks, and lists, the model can assist in generating concise summaries of web pages, which can aid users in their search for relevant information.

Web Page Conversion and Optimization: The model can be used to detect redundant or unnecessary elements on a web page and suggest their removal or modification, leading to cleaner designs and faster-loading pages. This can improve user experience and, potentially, search engine rankings.

Assisting Web Developers in Debugging and Testing: By detecting web page elements, the model can help identify inconsistencies or errors in a site's code or design, such as missing or misaligned elements, allowing developers to quickly diagnose and address these issues.
Attitudes towards the internet in Australia 2024
statista.com
Updated Apr 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Umair Bashir (2025). Attitudes towards the internet in Australia 2024 [Dataset]. https://www.statista.com/topics/1145/internet-usage-worldwide/
Explore at:
Dataset updated
Apr 11, 2025
Dataset provided by
Statistahttp://statista.com/
Authors
Umair Bashir
Description
When asked about "Attitudes towards the internet", most Australian respondents pick "It is important to me to have mobile internet access in any place at any time" as an answer. 53 percent did so in our online survey in 2024. Looking to gain valuable insights about users of internet providers worldwide? Check out our
Mobile internet usage reach in North America 2020-2029
statista.com
Updated Feb 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista Research Department (2025). Mobile internet usage reach in North America 2020-2029 [Dataset]. https://www.statista.com/topics/779/mobile-internet/
Explore at:
Dataset updated
Feb 5, 2025
Dataset provided by
Statistahttp://statista.com/
Authors
Statista Research Department
Description
The population share with mobile internet access in North America was forecast to increase between 2024 and 2029 by in total 2.9 percentage points. This overall increase does not happen continuously, notably not in 2028 and 2029. The mobile internet penetration is estimated to amount to 84.21 percent in 2029. Notably, the population share with mobile internet access of was continuously increasing over the past years.The penetration rate refers to the share of the total population having access to the internet via a mobile broadband connection.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the population share with mobile internet access in countries like Caribbean and Europe.
Mobile internet users worldwide 2020-2029
statista.com
Updated Feb 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista Research Department (2025). Mobile internet users worldwide 2020-2029 [Dataset]. https://www.statista.com/topics/779/mobile-internet/
Explore at:
Dataset updated
Feb 5, 2025
Dataset provided by
Statistahttp://statista.com/
Authors
Statista Research Department
Description
The global number of smartphone users in was forecast to continuously increase between 2024 and 2029 by in total 1.8 billion users (+42.62 percent). After the ninth consecutive increasing year, the smartphone user base is estimated to reach 6.1 billion users and therefore a new peak in 2029. Notably, the number of smartphone users of was continuously increasing over the past years.Smartphone users here are limited to internet users of any age using a smartphone. The shown figures have been derived from survey data that has been processed to estimate missing demographics.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of smartphone users in countries like Australia & Oceania and Asia.
Song lyrics from 79 musical genres
kaggle.com
Updated Mar 17, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anderson Neisse (2022). Song lyrics from 79 musical genres [Dataset]. https://www.kaggle.com/datasets/neisse/scrapped-lyrics-from-6-genres/data?select=artists-data.csv
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 17, 2022
Dataset provided by
Kaggle
Authors
Anderson Neisse
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
Context

This dataset came from a desire for stretching my web scrapping skills as well as to trian a LSTM network to maybe compose some lyrics. I detailed how I obtained the data here: Scraping lyrics from Vagalume.

Content

All the data were obtained by scraping the Brazilian website Vagalume using R.

There are two datasets artists-data.csv and lyrics-data.csv, originally they had data on only 6 musical genres, but on the last uptade i scraped all lyrics from the website.

Acknowledgements

This data is scraped from the Vagalume website, so it depends on their endavour on storing and sharing milions of song lyrics.

Inspiration

The data scraping of this dataset was inspired by the desire to analyze the data on music and train a LSTM to compose lyrics.

Facebook

Twitter

Click to copy link

Link copied

Cite

The Devastator (2022). Top Visited Websites [Dataset]. https://www.kaggle.com/datasets/thedevastator/the-top-websites-in-the-world/discussion

Top Visited Websites

A dataset of the top visited websites on the internet

Explore at:

72 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Nov 19, 2022

Dataset provided by

Kagglehttp://kaggle.com/

Authors

The Devastator

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

The Top Websites in the World

How They Change Over Time

About this dataset

This dataset consists of the top 50 most visited websites in the world, as well as the category and principal country/territory for each site. The data provides insights into which sites are most popular globally, and what type of content is most popular in different parts of the world

How to use the dataset

This dataset can be used to track the most popular websites in the world over time. It can also be used to compare website popularity between different countries and categories

Research Ideas

To track the most popular websites in the world over time

To see how website popularity changes by region

To find out which website categories are most popular

Acknowledgements

Dataset by Alexa Internet, Inc. (2019), released on Kaggle under the Open Data Commons Public Domain Dedication and License (ODC-PDDL)

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: df_1.csv | Column name | Description | |:--------------------------------|:---------------------------------------------------------------------| | Site | The name of the website. (String) | | Domain Name | The domain name of the website. (String) | | Category | The category of the website. (String) | | Principal country/territory | The principal country/territory where the website is based. (String) |

Clear search

Close search

Google apps

Main menu

Top Visited Websites

The Top Websites in the World

How They Change Over Time

About this dataset

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Custom dataset from any website on the Internet

🕵️ Phishing Websites Data

Phishing Websites Dataset

Overview

Dataset Information

Features Description

URL-Based Features

Domain-Based Features

Webpage-Based Features

Behavior-Based Features

Traffic & Search Engine Features

Target Variable

Usage

Number of internet users worldwide 2014-2029

Most visited websites by hierachycal categories

Context

Content

Acknowledgements

Inspiration

Attitudes towards the internet in Mexico 2024

Crunchyroll Meta-Data

Before please visit Crunchyroll

Dataset contains 7 files

popular.csv

series.csv

seasons.csv

episodes.csv

series_music.csv

audio.json

categories.json

Attitudes towards the internet in China 2024

Adverse effects of using the Internet and social networking websites or apps...

Mind2Web

NYC STEW-MAP Staten Island organizations' website hyperlink webscrape

Selected social outcomes of using the Internet and social networking...

Attitudes towards the internet in Japan 2024

Forward DNS (A, MX, NS, CNAME and TXT records)

WebUI Dataset

Web Page Object Detection Dataset

Attitudes towards the internet in Australia 2024

Mobile internet usage reach in North America 2020-2029

Mobile internet users worldwide 2020-2029

Song lyrics from 79 musical genres

Context

Content

Acknowledgements

Inspiration

Top Visited Websites

A dataset of the top visited websites on the internet

The Top Websites in the World

How They Change Over Time

About this dataset

How to use the dataset

Research Ideas

Acknowledgements

License

Columns