100+ datasets found
  1. w

    Websites using data-urls

    • webtechsurvey.com
    csv
    Updated Feb 10, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WebTechSurvey (2025). Websites using data-urls [Dataset]. https://webtechsurvey.com/technology/data-urls
    Explore at:
    csvAvailable download formats
    Dataset updated
    Feb 10, 2025
    Dataset authored and provided by
    WebTechSurvey
    License

    https://webtechsurvey.com/termshttps://webtechsurvey.com/terms

    Time period covered
    2025
    Area covered
    Global
    Description

    A complete list of live websites using the data-urls technology, compiled through global website indexing conducted by WebTechSurvey.

  2. d

    Global Web Data | Web Scraping Data | Job Postings Data | Source: Company...

    • datarade.ai
    .json
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PredictLeads, Global Web Data | Web Scraping Data | Job Postings Data | Source: Company Website | 232M+ Records [Dataset]. https://datarade.ai/data-products/predictleads-web-data-web-scraping-data-job-postings-dat-predictleads
    Explore at:
    .jsonAvailable download formats
    Dataset authored and provided by
    PredictLeads
    Area covered
    El Salvador, Bonaire, French Guiana, Virgin Islands (British), Northern Mariana Islands, Kosovo, Comoros, Guadeloupe, Bosnia and Herzegovina, Kuwait
    Description

    PredictLeads Job Openings Data provides high-quality hiring insights sourced directly from company websites - not job boards. Using advanced web scraping technology, our dataset offers real-time access to job trends, salaries, and skills demand, making it a valuable resource for B2B sales, recruiting, investment analysis, and competitive intelligence.

    Key Features:

    ✅232M+ Job Postings Tracked – Data sourced from 92 Million company websites worldwide. ✅7,1M+ Active Job Openings – Updated in real-time to reflect hiring demand. ✅Salary & Compensation Insights – Extract salary ranges, contract types, and job seniority levels. ✅Technology & Skill Tracking – Identify emerging tech trends and industry demands. ✅Company Data Enrichment – Link job postings to employer domains, firmographics, and growth signals. ✅Web Scraping Precision – Directly sourced from employer websites for unmatched accuracy.

    Primary Attributes:

    • id (string, UUID) – Unique identifier for the job posting.
    • type (string, constant: "job_opening") – Object type.
    • title (string) – Job title.
    • description (string) – Full job description, extracted from the job listing.
    • url (string, URL) – Direct link to the job posting.
    • first_seen_at – Timestamp when the job was first detected.
    • last_seen_at – Timestamp when the job was last detected.
    • last_processed_at – Timestamp when the job data was last processed.

    Job Metadata:

    • contract_types (array of strings) – Type of employment (e.g., "full time", "part time", "contract").
    • categories (array of strings) – Job categories (e.g., "engineering", "marketing").
    • seniority (string) – Seniority level of the job (e.g., "manager", "non_manager").
    • status (string) – Job status (e.g., "open", "closed").
    • language (string) – Language of the job posting.
    • location (string) – Full location details as listed in the job description.
    • Location Data (location_data) (array of objects)
    • city (string, nullable) – City where the job is located.
    • state (string, nullable) – State or region of the job location.
    • zip_code (string, nullable) – Postal/ZIP code.
    • country (string, nullable) – Country where the job is located.
    • region (string, nullable) – Broader geographical region.
    • continent (string, nullable) – Continent name.
    • fuzzy_match (boolean) – Indicates whether the location was inferred.

    Salary Data (salary_data)

    • salary (string) – Salary range extracted from the job listing.
    • salary_low (float, nullable) – Minimum salary in original currency.
    • salary_high (float, nullable) – Maximum salary in original currency.
    • salary_currency (string, nullable) – Currency of the salary (e.g., "USD", "EUR").
    • salary_low_usd (float, nullable) – Converted minimum salary in USD.
    • salary_high_usd (float, nullable) – Converted maximum salary in USD.
    • salary_time_unit (string, nullable) – Time unit for the salary (e.g., "year", "month", "hour").

    Occupational Data (onet_data) (object, nullable)

    • code (string, nullable) – ONET occupation code.
    • family (string, nullable) – Broad occupational family (e.g., "Computer and Mathematical").
    • occupation_name (string, nullable) – Official ONET occupation title.

    Additional Attributes:

    • tags (array of strings, nullable) – Extracted skills and keywords (e.g., "Python", "JavaScript").

    📌 Trusted by enterprises, recruiters, and investors for high-precision job market insights.

    PredictLeads Dataset: https://docs.predictleads.com/v3/guide/job_openings_dataset

  3. C

    China CN: Internet Service: No of Website

    • ceicdata.com
    Updated Oct 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CEICdata.com (2025). China CN: Internet Service: No of Website [Dataset]. https://www.ceicdata.com/en/china/internet-number-of-domain-and-website/cn-internet-service-no-of-website
    Explore at:
    Dataset updated
    Oct 15, 2025
    Dataset provided by
    CEICdata.com
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jun 1, 2019 - Dec 1, 2024
    Area covered
    China
    Variables measured
    Internet Statistics
    Description

    China Internet Service: Number of Website data was reported at 4.460 Unit mn in Dec 2024. This records an increase from the previous number of 3.910 Unit mn for Jun 2024. China Internet Service: Number of Website data is updated semiannually, averaging 2.939 Unit mn from Dec 2000 (Median) to Dec 2024, with 49 observations. The data reached an all-time high of 5.440 Unit mn in Jun 2018 and a record low of 0.243 Unit mn in Jun 2001. China Internet Service: Number of Website data remains active status in CEIC and is reported by China Internet Network Information Center. The data is categorized under China Premium Database’s Information and Communication Sector – Table CN.ICE: Internet: Number of Domain and Website.

  4. w

    Websites using Advanced Database Cleaner

    • webtechsurvey.com
    csv
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WebTechSurvey, Websites using Advanced Database Cleaner [Dataset]. https://webtechsurvey.com/technology/advanced-database-cleaner
    Explore at:
    csvAvailable download formats
    Dataset authored and provided by
    WebTechSurvey
    License

    https://webtechsurvey.com/termshttps://webtechsurvey.com/terms

    Time period covered
    2025
    Area covered
    Global
    Description

    A complete list of live websites using the Advanced Database Cleaner technology, compiled through global website indexing conducted by WebTechSurvey.

  5. f

    Web Designer Express | Graphics Multimedia & Web Design | Technology Data

    • datastore.forage.ai
    Updated Sep 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Web Designer Express | Graphics Multimedia & Web Design | Technology Data [Dataset]. https://datastore.forage.ai/searchresults/?resource_keyword=web
    Explore at:
    Dataset updated
    Sep 22, 2024
    Description

    Web Designer Express is a reputable Miami-based company that has been in business for 20 years. With a team of experienced web designers and developers, they offer a wide range of services, including web design, e-commerce development, web development, and more. Their portfolio showcases over 10,000 websites designed, with a focus on creating custom, unique solutions for each client. With a presence in Miami, Florida, they cater to businesses and individuals seeking to establish a strong online presence. As a company, Web Designer Express is dedicated to building long-lasting relationships with their clients, providing personalized service, and exceeding expectations.

  6. d

    Open Data Website Traffic

    • catalog.data.gov
    • data.lacity.org
    • +1more
    Updated Jun 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.lacity.org (2025). Open Data Website Traffic [Dataset]. https://catalog.data.gov/dataset/open-data-website-traffic
    Explore at:
    Dataset updated
    Jun 21, 2025
    Dataset provided by
    data.lacity.org
    Description

    Daily utilization metrics for data.lacity.org and geohub.lacity.org. Updated monthly

  7. Consumers fine with websites using their data to send them relevant ads U.S....

    • statista.com
    Updated May 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Consumers fine with websites using their data to send them relevant ads U.S. 2024 [Dataset]. https://www.statista.com/statistics/1612730/websites-use-data-relevant-ads-usa/
    Explore at:
    Dataset updated
    May 16, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2024
    Area covered
    United States
    Description

    During a 2024 survey, ** percent of responding consumers from the United States said they were fine with a website or app that they trusted or valued using their personal data to send them relevant advertising. The share stood at ** percent for Generation Z respondents.

  8. Share of top U.S. websites ignoring user privacy preferences 2024

    • statista.com
    Updated Mar 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Share of top U.S. websites ignoring user privacy preferences 2024 [Dataset]. https://www.statista.com/statistics/1560221/us-privacy-preference-ignoring/
    Explore at:
    Dataset updated
    Mar 4, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Sep 2024
    Area covered
    United States
    Description

    As of September 2024, 75 percent of the 100 most visited websites in the United States shared personal data with advertising 3rd parties, even when users opted out. Moreover, 70 percent of them drop advertising 3rd party cookies even when users opt out.

  9. Data from: Structural Profiling of Web Sites in the Wild

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Jun 10, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xavier Chamberland-Thibeault; Sylvain Hallé; Sylvain Hallé; Xavier Chamberland-Thibeault (2020). Structural Profiling of Web Sites in the Wild [Dataset]. http://doi.org/10.5281/zenodo.3718598
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 10, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Xavier Chamberland-Thibeault; Sylvain Hallé; Sylvain Hallé; Xavier Chamberland-Thibeault
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset contains and processes results of a large-scale survey of 708 websites, made in December 2019, in order to measure various features related to their size and structure: DOM tree size, maximum degree, depth, diversity of element types and CSS classes, among others. The goal of this research is to serve as a reference point for studies that include an empirical evaluation on samples of web pages.

    See the Readme.md file inside the archive for more details about its contents.

  10. w

    State of California - Data

    • data.wu.ac.at
    Updated Oct 11, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Global (2013). State of California - Data [Dataset]. https://data.wu.ac.at/odso/datahub_io/NDZlMmFjNWEtMGY1ZS00ZWVhLTgzZWEtMmY5ZmFhMGQyMjEx
    Explore at:
    Dataset updated
    Oct 11, 2013
    Dataset provided by
    Global
    Description

    About

    Data from the State of California. From website:

    Access raw State data files, databases, geographic data, and other data sources. Raw State data files can be reused by citizens and organizations for their own web applications and mashups.

    Openness

    Open. Effectively in the public domain. Terms of use page says:

    In general, information presented on this web site, unless otherwise indicated, is considered in the public domain. It may be distributed or copied as permitted by law. However, the State does make use of copyrighted data (e.g., photographs) which may require additional permissions prior to your use. In order to use any information on this web site not owned or created by the State, you must seek permission directly from the owning (or holding) sources. The State shall have the unlimited right to use for any purpose, free of any charge, all information submitted via this site except those submissions made under separate legal contract. The State shall be free to use, for any purpose, any ideas, concepts, or techniques contained in information provided through this site.

  11. d

    State of Oklahoma City Government Websites

    • catalog.data.gov
    • data.ok.gov
    • +3more
    Updated Nov 22, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.ok.gov (2024). State of Oklahoma City Government Websites [Dataset]. https://catalog.data.gov/dataset/state-of-oklahoma-city-government-websites-bdb86
    Explore at:
    Dataset updated
    Nov 22, 2024
    Dataset provided by
    data.ok.gov
    Area covered
    Oklahoma City, Oklahoma
    Description

    List of State of Oklahoma city government websites.

  12. black website

    • kaggle.com
    zip
    Updated Mar 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    listone (2023). black website [Dataset]. https://www.kaggle.com/datasets/listone/black-website
    Explore at:
    zip(22129518491 bytes)Available download formats
    Dataset updated
    Mar 23, 2023
    Authors
    listone
    Description

    The data can only be used for scientific research and commercial use is strictly prohibited. This is a underground industry web site dataset. It contains nearly 400,000 pieces of data. Each piece of data contains 14 attributes. All properties are contained in the result.json file. | Property | describes | data type | | --- | --- | --- | | ip | IP address | character string | | port | port number | continuous data| | server | web container |discrete data | | domain | domain name |text (domain name) | | title | site title |text | | org | organization |discrete data | | country | country |discrete data | | city | city |discrete data | | html | HTML original code |text | | screen | website screenshot | image| | header | Web response header information | text| | subject.CN | Common name information for SSL certificates |text (domain name) | | subject.N | SSL certificate subject optional name | text (list of domain names)| | links | Site external link |text (list of domain names) |

  13. d

    B2B Contact Data Scraped from Company Website | B2B Email Data, Phone...

    • datarade.ai
    .json, .csv
    Updated Apr 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OpenWeb Ninja (2024). B2B Contact Data Scraped from Company Website | B2B Email Data, Phone Numbers Data, Social Profile Links | Real-Time API [Dataset]. https://datarade.ai/data-products/openweb-ninja-scrape-company-website-for-b2b-contact-data-openweb-ninja
    Explore at:
    .json, .csvAvailable download formats
    Dataset updated
    Apr 27, 2024
    Dataset authored and provided by
    OpenWeb Ninja
    Area covered
    Germany, Iran (Islamic Republic of), France, Korea (Democratic People's Republic of), Morocco, Libya, Belarus, South Sudan, Bouvet Island, Cayman Islands
    Description

    OpenWeb Ninja’s Website Contacts Scraper API provides real-time access to B2B contact data directly from company websites and related public sources. The API delivers clean, structured results including B2B email data, phone number data, and social profile links, making it simple to enrich leads and build accurate company contact lists at scale.

    What's included: - Emails & Phone Numbers: extract business emails and phone contacts from a website domain. - Social Profile Links: capture company accounts on LinkedIn, Facebook, Instagram, TikTok, Twitter/X, YouTube, GitHub, and Pinterest. - Domain Search: input a company website domain and get all available contact details. - Company Name Lookup: find a company’s website domain by name, then retrieve its contact data. - Comprehensive Coverage: scrape across all accessible website pages for maximum data capture.

    Coverage & Scale: - 1,000+ emails and phone numbers per company website supported. - 8+ major social networks covered. - Real-time REST API for fast, reliable delivery.

    Use cases: - B2B contact enrichment and CRM updates. - Targeted email marketing campaigns. - Sales prospecting and lead generation. - Digital ads audience targeting. - Marketing and sales intelligence.

    With OpenWeb Ninja’s Website Contacts Scraper API, you get structured B2B email data, phone numbers, and social profiles straight from company websites - always delivered in real time via a fast and reliable API.

  14. Data from: Congressional Candidate Websites

    • icpsr.umich.edu
    ascii, delimited, r +3
    Updated Nov 25, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Druckman, James; Parkin, Michael; Kifer, Martin (2013). Congressional Candidate Websites [Dataset]. http://doi.org/10.3886/ICPSR34895.v1
    Explore at:
    r, delimited, stata, sas, ascii, spssAvailable download formats
    Dataset updated
    Nov 25, 2013
    Dataset provided by
    Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
    Authors
    Druckman, James; Parkin, Michael; Kifer, Martin
    License

    https://www.icpsr.umich.edu/web/ICPSR/studies/34895/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/34895/terms

    Time period covered
    2002 - 2006
    Area covered
    United States
    Description

    The Congressional Candidate Websites study uses congressional candidate Web site data from 2002 to 2006 to understand campaign behavior. The content analysis data includes information on major party House and Senate candidates, their districts/states, and aspects of their campaign Web sites including their use of technology and political variables such as endorsements, issue positions, image promotion, and negative commentary.

  15. Concerns over the protection of personal data by websites in Sweden 2018

    • statista.com
    Updated Nov 28, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Concerns over the protection of personal data by websites in Sweden 2018 [Dataset]. https://www.statista.com/statistics/498171/concerns-over-the-protection-of-personal-data-by-websites-in-sweden/
    Explore at:
    Dataset updated
    Nov 28, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Oct 2019
    Area covered
    Sweden
    Description

    The majority of the Swedes who took part in a survey conducted on 2019, stated they were concerned that their online information was not kept secure by websites (** percent). ** percent of the respondents disagreed with that statement.

  16. Z

    Network Traffic Analysis: Data and Code

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    Updated Jun 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Moran, Madeline; Honig, Joshua; Ferrell, Nathan; Soni, Shreena; Homan, Sophia; Chan-Tin, Eric (2024). Network Traffic Analysis: Data and Code [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11479410
    Explore at:
    Dataset updated
    Jun 12, 2024
    Dataset provided by
    Loyola University Chicago
    Authors
    Moran, Madeline; Honig, Joshua; Ferrell, Nathan; Soni, Shreena; Homan, Sophia; Chan-Tin, Eric
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Code:

    Packet_Features_Generator.py & Features.py

    To run this code:

    pkt_features.py [-h] -i TXTFILE [-x X] [-y Y] [-z Z] [-ml] [-s S] -j

    -h, --help show this help message and exit -i TXTFILE input text file -x X Add first X number of total packets as features. -y Y Add first Y number of negative packets as features. -z Z Add first Z number of positive packets as features. -ml Output to text file all websites in the format of websiteNumber1,feature1,feature2,... -s S Generate samples using size s. -j

    Purpose:

    Turns a text file containing lists of incomeing and outgoing network packet sizes into separate website objects with associative features.

    Uses Features.py to calcualte the features.

    startMachineLearning.sh & machineLearning.py

    To run this code:

    bash startMachineLearning.sh

    This code then runs machineLearning.py in a tmux session with the nessisary file paths and flags

    Options (to be edited within this file):

    --evaluate-only to test 5 fold cross validation accuracy

    --test-scaling-normalization to test 6 different combinations of scalers and normalizers

    Note: once the best combination is determined, it should be added to the data_preprocessing function in machineLearning.py for future use

    --grid-search to test the best grid search hyperparameters - note: the possible hyperparameters must be added to train_model under 'if not evaluateOnly:' - once best hyperparameters are determined, add them to train_model under 'if evaluateOnly:'

    Purpose:

    Using the .ml file generated by Packet_Features_Generator.py & Features.py, this program trains a RandomForest Classifier on the provided data and provides results using cross validation. These results include the best scaling and normailzation options for each data set as well as the best grid search hyperparameters based on the provided ranges.

    Data

    Encrypted network traffic was collected on an isolated computer visiting different Wikipedia and New York Times articles, different Google search queres (collected in the form of their autocomplete results and their results page), and different actions taken on a Virtual Reality head set.

    Data for this experiment was stored and analyzed in the form of a txt file for each experiment which contains:

    First number is a classification number to denote what website, query, or vr action is taking place.

    The remaining numbers in each line denote:

    The size of a packet,

    and the direction it is traveling.

    negative numbers denote incoming packets

    positive numbers denote outgoing packets

    Figure 4 Data

    This data uses specific lines from the Virtual Reality.txt file.

    The action 'LongText Search' refers to a user searching for "Saint Basils Cathedral" with text in the Wander app.

    The action 'ShortText Search' refers to a user searching for "Mexico" with text in the Wander app.

    The .xlsx and .csv file are identical

    Each file includes (from right to left):

    The origional packet data,

    each line of data organized from smallest to largest packet size in order to calculate the mean and standard deviation of each packet capture,

    and the final Cumulative Distrubution Function (CDF) caluclation that generated the Figure 4 Graph.

  17. d

    Business Website Data | 50 Countries Coverage | GDPR Compliant | 7,838,729...

    • datarade.ai
    .json, .csv
    Updated Aug 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HitHorizons (2025). Business Website Data | 50 Countries Coverage | GDPR Compliant | 7,838,729 Websites [Dataset]. https://datarade.ai/data-products/business-website-data-48-countries-coverage-gdpr-complian-hithorizons
    Explore at:
    .json, .csvAvailable download formats
    Dataset updated
    Aug 22, 2025
    Dataset authored and provided by
    HitHorizons
    Area covered
    Italy, United Kingdom, Luxembourg, Germany, France, Belgium, Estonia, Denmark
    Description

    The Business Websites Database of European Companies serves as an invaluable and comprehensive resource, meticulously curated to include an extensive and diverse collection of links directing users to the official websites of prominent and influential companies headquartered or operating within Europe. This database spans a wide array of industries and sectors, ranging from technology and finance to manufacturing, healthcare, retail, and beyond, ensuring that users have access to a broad spectrum of business information. By offering direct access to these companies' online platforms, the database not only facilitates seamless navigation to their digital presence but also provides users with the opportunity to explore detailed insights about their products, services, corporate values, and market activities, making it an essential tool for researchers, professionals, and anyone seeking to engage with the European business landscape.

  18. Deep web employee data leaks of selected e-commerce platforms 2024

    • statista.com
    Updated May 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Deep web employee data leaks of selected e-commerce platforms 2024 [Dataset]. https://www.statista.com/statistics/1350068/e-commerce-websites-deep-web-employee-credential-leaks/
    Explore at:
    Dataset updated
    May 15, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    May 15, 2024
    Area covered
    Worldwide
    Description

    Company information such as employee credentials is one of the most common assets online vendors trade illegally on the darknet. According to the source, Zalando.com has suffered thousands of data leakage incidents on the deep web in the 12 months leading up to ********, in which more than ***** employee credentials were compromised. Amazon registered a relatively low number of deep web data leaks, with roughly *** in the last 12 months.

  19. Web Page Phishing Dataset

    • kaggle.com
    Updated Feb 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Fernando (2024). Web Page Phishing Dataset [Dataset]. https://www.kaggle.com/datasets/danielfernandon/web-page-phishing-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 25, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Daniel Fernando
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Origin

    This dataset is the result of merging two datasets with identical features. However, not all features from the original datasets have been retained in the merged dataset. This selective feature inclusion was done to focus on the most relevant data and to avoid redundancy. The resulting dataset provides a comprehensive view of the shared characteristics between the two original datasets, while maintaining a streamlined and focused set of features.

    Dataset 1 : Web page phishing detection Hannousse, Abdelhakim; Yahiouche, Salima (2021), “Web page phishing detection”, Mendeley Data, V3, doi: 10.17632/c2gw7fy2j4.3

    Dataset 2: Phishing Websites Dataset Vrbančič, Grega (2020), “Phishing Websites Dataset”, Mendeley Data, V1, doi: 10.17632/72ptz43s9v.1

    Data Format

    The data is provided in CSV format, with each row representing a website and each column representing a feature. The last column contains the label for each website.

    Features

    This dataset contains the following features: 1. url_length: The length of the URL. 2. n_dots: The count of ‘.’ characters in the URL. 3. n_hypens: The count of ‘-’ characters in the URL. 4. n_underline: The count of ‘_’ characters in the URL. 5. n_slash: The count of ‘/’ characters in the URL. 6. n_questionmark: The count of ‘?’ characters in the URL. 7. n_equal: The count of ‘=’ characters in the URL. 8. n_at: The count of ‘@’ characters in the URL. 9. n_and: The count of ‘&’ characters in the URL. 10. n_exclamation: The count of ‘!’ characters in the URL. 11. n_space: The count of ’ ’ characters in the URL. 12. n_tilde: The count of ‘~’ characters in the URL. 13. n_comma: The count of ‘,’ characters in the URL. 14. n_plus: The count of ‘+’ characters in the URL. 15. n_asterisk: The count of ‘*’ characters in the URL. 16. n_hastag: The count of ‘#’ characters in the URL. 17. n_dollar: The count of ‘$’ characters in the URL. 18. n_percent: The count of ‘%’ characters in the URL. 19. n_redirection: The count of redirections in the URL. 20. phishing: The Labels of the URL. 1 is phishing and 0 is legitimate.

  20. CENSORED WEB-SITES BY ALL COUNTRIES

    • kaggle.com
    zip
    Updated Dec 23, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Baris Dincer (2021). CENSORED WEB-SITES BY ALL COUNTRIES [Dataset]. https://www.kaggle.com/brsdincer/censored-websites-by-all-countries
    Explore at:
    zip(545580 bytes)Available download formats
    Dataset updated
    Dec 23, 2021
    Authors
    Baris Dincer
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    CENSORED WEB-SITES BY ALL COUNTRIES

    Sites that were or are currently banned.

    This data was created by each country's own users.

    • Some of the sites you have seen may have been active again.
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
WebTechSurvey (2025). Websites using data-urls [Dataset]. https://webtechsurvey.com/technology/data-urls

Websites using data-urls

Explore at:
csvAvailable download formats
Dataset updated
Feb 10, 2025
Dataset authored and provided by
WebTechSurvey
License

https://webtechsurvey.com/termshttps://webtechsurvey.com/terms

Time period covered
2025
Area covered
Global
Description

A complete list of live websites using the data-urls technology, compiled through global website indexing conducted by WebTechSurvey.

Search
Clear search
Close search
Google apps
Main menu