60 datasets found
  1. Top Visited Websites

    • kaggle.com
    Updated Nov 19, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). Top Visited Websites [Dataset]. https://www.kaggle.com/datasets/thedevastator/the-top-websites-in-the-world/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 19, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The Top Websites in the World

    How They Change Over Time

    About this dataset

    This dataset consists of the top 50 most visited websites in the world, as well as the category and principal country/territory for each site. The data provides insights into which sites are most popular globally, and what type of content is most popular in different parts of the world

    How to use the dataset

    This dataset can be used to track the most popular websites in the world over time. It can also be used to compare website popularity between different countries and categories

    Research Ideas

    • To track the most popular websites in the world over time
    • To see how website popularity changes by region
    • To find out which website categories are most popular

    Acknowledgements

    Dataset by Alexa Internet, Inc. (2019), released on Kaggle under the Open Data Commons Public Domain Dedication and License (ODC-PDDL)

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: df_1.csv | Column name | Description | |:--------------------------------|:---------------------------------------------------------------------| | Site | The name of the website. (String) | | Domain Name | The domain name of the website. (String) | | Category | The category of the website. (String) | | Principal country/territory | The principal country/territory where the website is based. (String) |

  2. d

    Custom dataset from any website on the Internet

    • datarade.ai
    Updated Oct 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ScrapeLabs (2024). Custom dataset from any website on the Internet [Dataset]. https://datarade.ai/data-products/custom-dataset-from-any-website-on-the-internet-scrapelabs
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset updated
    Oct 25, 2024
    Dataset authored and provided by
    ScrapeLabs
    Area covered
    India, Kazakhstan, Bulgaria, Tunisia, Jordan, Turks and Caicos Islands, Argentina, Guinea-Bissau, Lebanon, Aruba
    Description

    We'll extract any data from any website on the Internet. You don't have to worry about buying and maintaining complex and expensive software, or hiring developers.

    Some common use cases our customers use the data for: • Data Analysis • Market Research • Price Monitoring • Sales Leads • Competitor Analysis • Recruitment

    We can get data from websites with pagination or scroll, with captchas, and even from behind logins. Text, images, videos, documents.

    Receive data in any format you need: Excel, CSV, JSON, or any other.

  3. 🕵️ Phishing Websites Data

    • kaggle.com
    Updated Feb 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sairaj Adhav (2025). 🕵️ Phishing Websites Data [Dataset]. https://www.kaggle.com/datasets/sai10py/phishing-websites-data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 24, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Sairaj Adhav
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Phishing Websites Dataset

    Overview

    This dataset is designed to aid in the analysis and detection of phishing websites. It contains various features that help distinguish between legitimate and phishing websites based on their structural, security, and behavioral attributes.

    Dataset Information

    • Total Columns: 31 (30 Features + 1 Target)
    • Target Variable: Result (Indicates whether a website is phishing or legitimate)

    Features Description

    URL-Based Features

    • Prefix_Suffix – Checks if the URL contains a hyphen (-), which is commonly used in phishing domains.
    • double_slash_redirecting – Detects if the URL redirects using //, which may indicate a phishing attempt.
    • having_At_Symbol – Identifies the presence of @ in the URL, which can be used to deceive users.
    • Shortining_Service – Indicates whether the URL uses a shortening service (e.g., bit.ly, tinyurl).
    • URL_Length – Measures the length of the URL; phishing URLs tend to be longer.
    • having_IP_Address – Checks if an IP address is used in place of a domain name, which is suspicious.

    Domain-Based Features

    • having_Sub_Domain – Evaluates the number of subdomains; phishing sites often have excessive subdomains.
    • SSLfinal_State – Indicates whether the website has a valid SSL certificate (secure connection).
    • Domain_registeration_length – Measures the duration of domain registration; phishing sites often have short lifespans.
    • age_of_domain – The age of the domain in days; older domains are usually more trustworthy.
    • DNSRecord – Checks if the domain has valid DNS records; phishing domains may lack these.

    Webpage-Based Features

    • Favicon – Determines if the website uses an external favicon (which can be a sign of phishing).
    • port – Identifies if the site is using suspicious or non-standard ports.
    • HTTPS_token – Checks if "HTTPS" is included in the URL but is used deceptively.
    • Request_URL – Measures the percentage of external resources loaded from different domains.
    • URL_of_Anchor – Analyzes anchor tags (<a> links) and their trustworthiness.
    • Links_in_tags – Examines <meta>, <script>, and <link> tags for external links.
    • SFH (Server Form Handler) – Determines if form actions are handled suspiciously.
    • Submitting_to_email – Checks if forms submit data directly to an email instead of a web server.
    • Abnormal_URL – Identifies if the website’s URL structure is inconsistent with common patterns.
    • Redirect – Counts the number of redirects; phishing websites may have excessive redirects.

    Behavior-Based Features

    • on_mouseover – Checks if the website changes content when hovered over (used in deceptive techniques).
    • RightClick – Detects if right-click functionality is disabled (phishing sites may disable it).
    • popUpWindow – Identifies the presence of pop-ups, which can be used to trick users.
    • Iframe – Checks if the website uses <iframe> tags, often used in phishing attacks.

    Traffic & Search Engine Features

    • web_traffic – Measures the website’s Alexa ranking; phishing sites tend to have low traffic.
    • Page_Rank – Google PageRank score; phishing sites usually have a low PageRank.
    • Google_Index – Checks if the website is indexed by Google (phishing sites may not be indexed).
    • Links_pointing_to_page – Counts the number of backlinks pointing to the website.
    • Statistical_report – Uses external sources to verify if the website has been reported for phishing.

    Target Variable

    • Result – The classification label (1: Legitimate, -1: Phishing)

    Usage

    This dataset is valuable for:
    Machine Learning Models – Developing classifiers for phishing detection.
    Cybersecurity Research – Understanding patterns in phishing attacks.
    Browser Security Extensions – Enhancing anti-phishing tools.

  4. Number of internet users worldwide 2014-2029

    • statista.com
    Updated Apr 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista Research Department (2025). Number of internet users worldwide 2014-2029 [Dataset]. https://www.statista.com/topics/1145/internet-usage-worldwide/
    Explore at:
    Dataset updated
    Apr 11, 2025
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Statista Research Department
    Area covered
    World
    Description

    The global number of internet users in was forecast to continuously increase between 2024 and 2029 by in total 1.3 billion users (+23.66 percent). After the fifteenth consecutive increasing year, the number of users is estimated to reach 7 billion users and therefore a new peak in 2029. Notably, the number of internet users of was continuously increasing over the past years.Depicted is the estimated number of individuals in the country or region at hand, that use the internet. As the datasource clarifies, connection quality and usage frequency are distinct aspects, not taken into account here.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of internet users in countries like the Americas and Asia.

  5. Most visited websites by hierachycal categories

    • kaggle.com
    Updated Sep 18, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Natanael de Souza Figueiredo (2020). Most visited websites by hierachycal categories [Dataset]. https://www.kaggle.com/natanael127/most-visited-websites-by-hierachycal-categories/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 18, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Natanael de Souza Figueiredo
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    Alexa Internet was founded in April 1996 by Brewster Kahle and Bruce Gilliat. The company's name was chosen in homage to the Library of Alexandria of Ptolemaic Egypt, drawing a parallel between the largest repository of knowledge in the ancient world and the potential of the Internet to become a similar store of knowledge. (from Wikipedia)

    The categories list was going out by September, 17h, 2020. So I would like to save it. https://support.alexa.com/hc/en-us/articles/360051913314

    This dataset was elaborated by this python script (V2.0): https://github.com/natanael127/dump-alexa-ranking

    Content

    The sites are grouped in 17 macro categories and this tree ends having more than 360.000 nodes. Subjects are very organized and each of them has its own rank of most accessed domains. So, even the keys of a sub-dictionary may be a good small dataset to use.

    Acknowledgements

    Thank you my friend André (https://github.com/andrerclaudio) by helping me with tips of Google Colaboratory and computational power to get the data until our deadline.

    Inspiration

    Alexa ranking was inspired by Library of Alexandria. In the modern world, it may be a good start for AI know more about many, many subjects of the world.

  6. Attitudes towards the internet in Mexico 2024

    • statista.com
    Updated Apr 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Umair Bashir (2025). Attitudes towards the internet in Mexico 2024 [Dataset]. https://www.statista.com/topics/1145/internet-usage-worldwide/
    Explore at:
    Dataset updated
    Apr 11, 2025
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Umair Bashir
    Description

    When asked about "Attitudes towards the internet", most Mexican respondents pick "It is important to me to have mobile internet access in any place at any time" as an answer. 55 percent did so in our online survey in 2024. Looking to gain valuable insights about users of internet providers worldwide? Check out our

  7. Crunchyroll Meta-Data

    • kaggle.com
    Updated Aug 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BIT_Guber (2023). Crunchyroll Meta-Data [Dataset]. https://www.kaggle.com/datasets/bitguber/crunchyroll-meta-data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 15, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    BIT_Guber
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This is just prepared data from crunchyroll web scraped data using code line here I extracted meta-data from crunchyroll websites.

    Before please visit Crunchyroll

    Dataset contains 7 files

    popular.csv

    Each row represented a series in popular page. note: some information not updated ( I guess Crunchyroll not update is Popular table in Database )

    series.csv

    It's also have similar feature as popular.csv but updated data points.

    seasons.csv

    Each row represented a season from it's corresponding series.

    episodes.csv

    Information about individual episodes from it's corresponding series.

    series_music.csv

    Some series have featured music collection.

    audio.json

    Mapping full representation of audio version of episode dubbed.

    categories.json

    Mapping each categories of series ,it defined by crunchyroll.

  8. Attitudes towards the internet in China 2024

    • statista.com
    Updated Apr 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Umair Bashir (2025). Attitudes towards the internet in China 2024 [Dataset]. https://www.statista.com/topics/1145/internet-usage-worldwide/
    Explore at:
    Dataset updated
    Apr 11, 2025
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Umair Bashir
    Description

    When asked about "Attitudes towards the internet", most Chinese respondents pick "It is important to me to have mobile internet access in any place at any time" as an answer. 49 percent did so in our online survey in 2024. Looking to gain valuable insights about users of internet providers worldwide? Check out our

  9. G

    Adverse effects of using the Internet and social networking websites or apps...

    • open.canada.ca
    • www150.statcan.gc.ca
    • +2more
    csv, html, xml
    Updated Jan 17, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statistics Canada (2023). Adverse effects of using the Internet and social networking websites or apps by gender and age group, inactive [Dataset]. https://open.canada.ca/data/en/dataset/80c88ac9-8ea1-4ff7-856e-560f7683d660
    Explore at:
    html, xml, csvAvailable download formats
    Dataset updated
    Jan 17, 2023
    Dataset provided by
    Statistics Canada
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Description

    Percentage of Internet users who have experienced selected personal effects in their life because of the Internet and the use of social networking websites or apps, during the past 12 months.

  10. h

    Mind2Web

    • huggingface.co
    Updated Jun 12, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OSU NLP Group (2023). Mind2Web [Dataset]. https://huggingface.co/datasets/osunlp/Mind2Web
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 12, 2023
    Dataset authored and provided by
    OSU NLP Group
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Card for Dataset Name

      Dataset Summary
    

    Mind2Web is a dataset for developing and evaluating generalist agents for the web that can follow language instructions to complete complex tasks on any website. Existing datasets for web agents either use simulated websites or only cover a limited set of websites and tasks, thus not suitable for generalist web agents. With over 2,000 open-ended tasks collected from 137 websites spanning 31 domains and crowdsourced action… See the full description on the dataset page: https://huggingface.co/datasets/osunlp/Mind2Web.

  11. NYC STEW-MAP Staten Island organizations' website hyperlink webscrape

    • catalog.data.gov
    • s.cnmilf.com
    Updated Nov 21, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2022). NYC STEW-MAP Staten Island organizations' website hyperlink webscrape [Dataset]. https://catalog.data.gov/dataset/nyc-stew-map-staten-island-organizations-website-hyperlink-webscrape
    Explore at:
    Dataset updated
    Nov 21, 2022
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Area covered
    Staten Island, New York
    Description

    The data represent web-scraping of hyperlinks from a selection of environmental stewardship organizations that were identified in the 2017 NYC Stewardship Mapping and Assessment Project (STEW-MAP) (USDA 2017). There are two data sets: 1) the original scrape containing all hyperlinks within the websites and associated attribute values (see "README" file); 2) a cleaned and reduced dataset formatted for network analysis. For dataset 1: Organizations were selected from from the 2017 NYC Stewardship Mapping and Assessment Project (STEW-MAP) (USDA 2017), a publicly available, spatial data set about environmental stewardship organizations working in New York City, USA (N = 719). To create a smaller and more manageable sample to analyze, all organizations that intersected (i.e., worked entirely within or overlapped) the NYC borough of Staten Island were selected for a geographically bounded sample. Only organizations with working websites and that the web scraper could access were retained for the study (n = 78). The websites were scraped between 09 and 17 June 2020 to a maximum search depth of ten using the snaWeb package (version 1.0.1, Stockton 2020) in the R computational language environment (R Core Team 2020). For dataset 2: The complete scrape results were cleaned, reduced, and formatted as a standard edge-array (node1, node2, edge attribute) for network analysis. See "READ ME" file for further details. References: R Core Team. (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/. Version 4.0.3. Stockton, T. (2020). snaWeb Package: An R package for finding and building social networks for a website, version 1.0.1. USDA Forest Service. (2017). Stewardship Mapping and Assessment Project (STEW-MAP). New York City Data Set. Available online at https://www.nrs.fs.fed.us/STEW-MAP/data/. This dataset is associated with the following publication: Sayles, J., R. Furey, and M. Ten Brink. How deep to dig: effects of web-scraping search depth on hyperlink network analysis of environmental stewardship organizations. Applied Network Science. Springer Nature, New York, NY, 7: 36, (2022).

  12. G

    Selected social outcomes of using the Internet and social networking...

    • open.canada.ca
    • www150.statcan.gc.ca
    • +1more
    csv, html, xml
    Updated Jan 17, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statistics Canada (2023). Selected social outcomes of using the Internet and social networking websites or apps by gender and age group [Dataset]. https://open.canada.ca/data/en/dataset/971e1d31-a88f-41f6-a68d-1e1f236da491
    Explore at:
    csv, html, xmlAvailable download formats
    Dataset updated
    Jan 17, 2023
    Dataset provided by
    Statistics Canada
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Description

    Percentage of Canadians who have experienced selected personal effects in their life because of the Internet and the use of social networking websites or apps, during the past 12 months.

  13. Attitudes towards the internet in Japan 2024

    • statista.com
    Updated Apr 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Umair Bashir (2025). Attitudes towards the internet in Japan 2024 [Dataset]. https://www.statista.com/topics/1145/internet-usage-worldwide/
    Explore at:
    Dataset updated
    Apr 11, 2025
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Umair Bashir
    Description

    When asked about "Attitudes towards the internet", most Japanese respondents pick "I could no longer imagine my everyday life without the internet" as an answer. 56 percent did so in our online survey in 2024. Looking to gain valuable insights about users of internet providers worldwide? Check out our

  14. d

    Forward DNS (A, MX, NS, CNAME and TXT records)

    • datarade.ai
    .json, .csv
    Updated Nov 29, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Netlas.io (2021). Forward DNS (A, MX, NS, CNAME and TXT records) [Dataset]. https://datarade.ai/data-products/whole-dns-registry-a-mx-ns-cname-and-txt-records-netlas
    Explore at:
    .json, .csvAvailable download formats
    Dataset updated
    Nov 29, 2021
    Dataset provided by
    Netlas.io
    Area covered
    New Caledonia, Nauru, Djibouti, India, Åland Islands, Saint Helena, Mauritius, Seychelles, Falkland Islands (Malvinas), Ecuador
    Description

    Netlas.io is a set of internet intelligence apps that provide accurate technical information on IP addresses, domain names, websites, web applications, IoT devices, and other online assets.

    Netlas.io scans every IPv4 address and every known domain name utilizing such protocols as HTTP, FTP, SMTP, POP3, IMAP, SMB/CIFS, SSH, Telnet, SQL and others. Collected data is enriched with additional info and available in Netlas.io Search Engine. Some parts of Netlas.io database is available as downloadable datasets.

    Netlas.io accumulates domain names to make internet scan coverage as wide as possible. Domain names are collected from ICANN Centralized Zone Data Service, SSL Certificates, 301 & 302 HTTP redirects (while scanning) and other sources.

    This dataset contains domains and subdomains (all gTLD and ccTLD), that have at least one associated DNS registry entry (A, MX, NS, CNAME and TXT records).

  15. P

    WebUI Dataset

    • paperswithcode.com
    Updated Mar 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). WebUI Dataset [Dataset]. https://paperswithcode.com/dataset/webui
    Explore at:
    Dataset updated
    Mar 15, 2025
    Description

    The WebUI dataset contains 400K web UIs captured over a period of 3 months and cost about $500 to crawl. We grouped web pages together by their domain name, then generated training (70%), validation (10%), and testing (20%) splits. This ensured that similar pages from the same website must appear in the same split. We created four versions of the training dataset. Three of these splits were generated by randomly sampling a subset of the training split: Web-7k, Web-70k, Web-350k. We chose 70k as a baseline size, since it is approximately the size of existing UI datasets. We also generated an additional split (Web-7k-Resampled) to provide a small, higher quality split for experimentation. Web-7k-Resampled was generated using a class-balancing sampling technique, and we removed screens with possible visual defects (e.g., very small, occluded, or invisible elements). The validation and test split was always kept the same.

  16. R

    Web Page Object Detection Dataset

    • universe.roboflow.com
    zip
    Updated Mar 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    web page summarizer (2023). Web Page Object Detection Dataset [Dataset]. https://universe.roboflow.com/web-page-summarizer/web-page-object-detection
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 2, 2023
    Dataset authored and provided by
    web page summarizer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Web Page Elements Bounding Boxes
    Description

    Here are a few use cases for this project:

    1. Web Accessibility Improvement: The "Web Page Object Detection" model can be used to identify and label various elements on a web page, making it easier for people with visual impairments to navigate and interact with websites using screen readers and other assistive technologies.

    2. Web Design Analysis: The model can be employed to analyze the structure and layout of popular websites, helping web designers understand best practices and trends in web design. This information can inform the creation of new, user-friendly websites or redesigns of existing pages.

    3. Automatic Web Page Summary Generation: By identifying and extracting key elements, such as titles, headings, content blocks, and lists, the model can assist in generating concise summaries of web pages, which can aid users in their search for relevant information.

    4. Web Page Conversion and Optimization: The model can be used to detect redundant or unnecessary elements on a web page and suggest their removal or modification, leading to cleaner designs and faster-loading pages. This can improve user experience and, potentially, search engine rankings.

    5. Assisting Web Developers in Debugging and Testing: By detecting web page elements, the model can help identify inconsistencies or errors in a site's code or design, such as missing or misaligned elements, allowing developers to quickly diagnose and address these issues.

  17. Attitudes towards the internet in Australia 2024

    • statista.com
    Updated Apr 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Umair Bashir (2025). Attitudes towards the internet in Australia 2024 [Dataset]. https://www.statista.com/topics/1145/internet-usage-worldwide/
    Explore at:
    Dataset updated
    Apr 11, 2025
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Umair Bashir
    Description

    When asked about "Attitudes towards the internet", most Australian respondents pick "It is important to me to have mobile internet access in any place at any time" as an answer. 53 percent did so in our online survey in 2024. Looking to gain valuable insights about users of internet providers worldwide? Check out our

  18. Mobile internet usage reach in North America 2020-2029

    • statista.com
    Updated Feb 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista Research Department (2025). Mobile internet usage reach in North America 2020-2029 [Dataset]. https://www.statista.com/topics/779/mobile-internet/
    Explore at:
    Dataset updated
    Feb 5, 2025
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Statista Research Department
    Description

    The population share with mobile internet access in North America was forecast to increase between 2024 and 2029 by in total 2.9 percentage points. This overall increase does not happen continuously, notably not in 2028 and 2029. The mobile internet penetration is estimated to amount to 84.21 percent in 2029. Notably, the population share with mobile internet access of was continuously increasing over the past years.The penetration rate refers to the share of the total population having access to the internet via a mobile broadband connection.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the population share with mobile internet access in countries like Caribbean and Europe.

  19. Mobile internet users worldwide 2020-2029

    • statista.com
    Updated Feb 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista Research Department (2025). Mobile internet users worldwide 2020-2029 [Dataset]. https://www.statista.com/topics/779/mobile-internet/
    Explore at:
    Dataset updated
    Feb 5, 2025
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Statista Research Department
    Description

    The global number of smartphone users in was forecast to continuously increase between 2024 and 2029 by in total 1.8 billion users (+42.62 percent). After the ninth consecutive increasing year, the smartphone user base is estimated to reach 6.1 billion users and therefore a new peak in 2029. Notably, the number of smartphone users of was continuously increasing over the past years.Smartphone users here are limited to internet users of any age using a smartphone. The shown figures have been derived from survey data that has been processed to estimate missing demographics.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of smartphone users in countries like Australia & Oceania and Asia.

  20. Song lyrics from 79 musical genres

    • kaggle.com
    Updated Mar 17, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anderson Neisse (2022). Song lyrics from 79 musical genres [Dataset]. https://www.kaggle.com/datasets/neisse/scrapped-lyrics-from-6-genres/data?select=artists-data.csv
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 17, 2022
    Dataset provided by
    Kaggle
    Authors
    Anderson Neisse
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    Context

    This dataset came from a desire for stretching my web scrapping skills as well as to trian a LSTM network to maybe compose some lyrics. I detailed how I obtained the data here: Scraping lyrics from Vagalume.

    Content

    All the data were obtained by scraping the Brazilian website Vagalume using R.

    There are two datasets artists-data.csv and lyrics-data.csv, originally they had data on only 6 musical genres, but on the last uptade i scraped all lyrics from the website.

    Acknowledgements

    This data is scraped from the Vagalume website, so it depends on their endavour on storing and sharing milions of song lyrics.

    Inspiration

    The data scraping of this dataset was inspired by the desire to analyze the data on music and train a LSTM to compose lyrics.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
The Devastator (2022). Top Visited Websites [Dataset]. https://www.kaggle.com/datasets/thedevastator/the-top-websites-in-the-world/discussion
Organization logo

Top Visited Websites

A dataset of the top visited websites on the internet

Explore at:
72 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 19, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

The Top Websites in the World

How They Change Over Time

About this dataset

This dataset consists of the top 50 most visited websites in the world, as well as the category and principal country/territory for each site. The data provides insights into which sites are most popular globally, and what type of content is most popular in different parts of the world

How to use the dataset

This dataset can be used to track the most popular websites in the world over time. It can also be used to compare website popularity between different countries and categories

Research Ideas

  • To track the most popular websites in the world over time
  • To see how website popularity changes by region
  • To find out which website categories are most popular

Acknowledgements

Dataset by Alexa Internet, Inc. (2019), released on Kaggle under the Open Data Commons Public Domain Dedication and License (ODC-PDDL)

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: df_1.csv | Column name | Description | |:--------------------------------|:---------------------------------------------------------------------| | Site | The name of the website. (String) | | Domain Name | The domain name of the website. (String) | | Category | The category of the website. (String) | | Principal country/territory | The principal country/territory where the website is based. (String) |

Search
Clear search
Close search
Google apps
Main menu