100+ datasets found
  1. d

    Web Scraping Data | Key Customers Domain Name Data | Scanning Logos found on...

    • datarade.ai
    .json
    Updated Jun 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PredictLeads (2024). Web Scraping Data | Key Customers Domain Name Data | Scanning Logos found on Websites | 248M+ Records [Dataset]. https://datarade.ai/data-products/predictleads-web-scraping-data-domain-name-data-business-predictleads
    Explore at:
    .jsonAvailable download formats
    Dataset updated
    Jun 27, 2024
    Dataset authored and provided by
    PredictLeads
    Area covered
    Northern Mariana Islands, Benin, Colombia, Malaysia, Turkmenistan, Burkina Faso, Nigeria, Svalbard and Jan Mayen, Curaçao, Oman
    Description

    PredictLeads Key Customers Data provides essential business intelligence by analyzing company relationships, uncovering vendor partnerships, client connections, and strategic affiliations through advanced web scraping and logo recognition. This dataset captures business interactions directly from company websites, offering valuable insights into market positioning, competitive landscapes, and growth opportunities.

    Use Cases:

    ✅ Account Profiling – Gain a 360-degree customer view by mapping company relationships and partnerships. ✅ Competitive Intelligence – Track vendor-client connections and business affiliations to identify key industry players. ✅ B2B Lead Targeting – Prioritize leads based on their business relationships, improving sales and marketing efficiency. ✅ CRM Data Enrichment – Enhance company records with detailed key customer data, ensuring data accuracy. ✅ Market Research – Identify emerging trends and industry networks to optimize strategic planning.

    Key API Attributes:

    • id (string, UUID) – Unique identifier for the company connection.
    • category (string) – Type of relationship (e.g., vendor, client, partner).
    • source_category (string) – Where the connection was detected (e.g., partner page, case study).
    • source_url (string, URL) – Website where the relationship was found.
    • individual_source_url (string, URL) – Specific page confirming the connection.
    • context (string) – Extracted description of the business relationship (e.g., "Company X - partners with Company Y to enhance payment processing").
    • first_seen_at (ISO 8601 date-time) – Date the connection was first detected.
    • last_seen_at (ISO 8601 date-time) – Most recent confirmation of the relationship.
    • company1 & company2 (objects) – Details of the two connected companies, including:
    • - domain (string) – Company website domain.
    • - company_name (string) – Official company name.
    • - ticker (string, nullable) – Stock ticker, if available.

    📌 PredictLeads Key Customers Data is an indispensable tool for B2B sales, marketing, and market intelligence teams, providing actionable relationship insights to drive targeted outreach, competitor tracking, and strategic decision-making.

    PredictLeads Docs: https://docs.predictleads.com/v3/guide/connections_dataset

  2. Website Statistics

    • data.wu.ac.at
    • data.europa.eu
    csv, pdf
    Updated Jun 11, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lincolnshire County Council (2018). Website Statistics [Dataset]. https://data.wu.ac.at/schema/data_gov_uk/M2ZkZDBjOTUtMzNhYi00YWRjLWI1OWMtZmUzMzA5NjM0ZTdk
    Explore at:
    csv, pdfAvailable download formats
    Dataset updated
    Jun 11, 2018
    Dataset provided by
    Lincolnshire County Councilhttp://www.lincolnshire.gov.uk/
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    This Website Statistics dataset has four resources showing usage of the Lincolnshire Open Data website. Web analytics terms used in each resource are defined in their accompanying Metadata file.

    • Website Usage Statistics: This document shows a statistical summary of usage of the Lincolnshire Open Data site for the latest calendar year.

    • Website Statistics Summary: This dataset shows a website statistics summary for the Lincolnshire Open Data site for the latest calendar year.

    • Webpage Statistics: This dataset shows statistics for individual Webpages on the Lincolnshire Open Data site by calendar year.

    • Dataset Statistics: This dataset shows cumulative totals for Datasets on the Lincolnshire Open Data site that have also been published on the national Open Data site Data.Gov.UK - see the Source link.

      Note: Website and Webpage statistics (the first three resources above) show only UK users, and exclude API calls (automated requests for datasets). The Dataset Statistics are confined to users with javascript enabled, which excludes web crawlers and API calls.

    These Website Statistics resources are updated annually in January by the Lincolnshire County Council Business Intelligence team. For any enquiries about the information contact opendata@lincolnshire.gov.uk.

  3. w

    Websites using data-urls

    • webtechsurvey.com
    csv
    Updated Feb 10, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WebTechSurvey (2025). Websites using data-urls [Dataset]. https://webtechsurvey.com/technology/data-urls
    Explore at:
    csvAvailable download formats
    Dataset updated
    Feb 10, 2025
    Dataset authored and provided by
    WebTechSurvey
    License

    https://webtechsurvey.com/termshttps://webtechsurvey.com/terms

    Time period covered
    2025
    Area covered
    Global
    Description

    A complete list of live websites using the data-urls technology, compiled through global website indexing conducted by WebTechSurvey.

  4. Z

    Network Traffic Analysis: Data and Code

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jun 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Honig, Joshua (2024). Network Traffic Analysis: Data and Code [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11479410
    Explore at:
    Dataset updated
    Jun 12, 2024
    Dataset provided by
    Ferrell, Nathan
    Homan, Sophia
    Moran, Madeline
    Chan-Tin, Eric
    Soni, Shreena
    Honig, Joshua
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Code:

    Packet_Features_Generator.py & Features.py

    To run this code:

    pkt_features.py [-h] -i TXTFILE [-x X] [-y Y] [-z Z] [-ml] [-s S] -j

    -h, --help show this help message and exit -i TXTFILE input text file -x X Add first X number of total packets as features. -y Y Add first Y number of negative packets as features. -z Z Add first Z number of positive packets as features. -ml Output to text file all websites in the format of websiteNumber1,feature1,feature2,... -s S Generate samples using size s. -j

    Purpose:

    Turns a text file containing lists of incomeing and outgoing network packet sizes into separate website objects with associative features.

    Uses Features.py to calcualte the features.

    startMachineLearning.sh & machineLearning.py

    To run this code:

    bash startMachineLearning.sh

    This code then runs machineLearning.py in a tmux session with the nessisary file paths and flags

    Options (to be edited within this file):

    --evaluate-only to test 5 fold cross validation accuracy

    --test-scaling-normalization to test 6 different combinations of scalers and normalizers

    Note: once the best combination is determined, it should be added to the data_preprocessing function in machineLearning.py for future use

    --grid-search to test the best grid search hyperparameters - note: the possible hyperparameters must be added to train_model under 'if not evaluateOnly:' - once best hyperparameters are determined, add them to train_model under 'if evaluateOnly:'

    Purpose:

    Using the .ml file generated by Packet_Features_Generator.py & Features.py, this program trains a RandomForest Classifier on the provided data and provides results using cross validation. These results include the best scaling and normailzation options for each data set as well as the best grid search hyperparameters based on the provided ranges.

    Data

    Encrypted network traffic was collected on an isolated computer visiting different Wikipedia and New York Times articles, different Google search queres (collected in the form of their autocomplete results and their results page), and different actions taken on a Virtual Reality head set.

    Data for this experiment was stored and analyzed in the form of a txt file for each experiment which contains:

    First number is a classification number to denote what website, query, or vr action is taking place.

    The remaining numbers in each line denote:

    The size of a packet,

    and the direction it is traveling.

    negative numbers denote incoming packets

    positive numbers denote outgoing packets

    Figure 4 Data

    This data uses specific lines from the Virtual Reality.txt file.

    The action 'LongText Search' refers to a user searching for "Saint Basils Cathedral" with text in the Wander app.

    The action 'ShortText Search' refers to a user searching for "Mexico" with text in the Wander app.

    The .xlsx and .csv file are identical

    Each file includes (from right to left):

    The origional packet data,

    each line of data organized from smallest to largest packet size in order to calculate the mean and standard deviation of each packet capture,

    and the final Cumulative Distrubution Function (CDF) caluclation that generated the Figure 4 Graph.

  5. D

    Website Analytics

    • data.nola.gov
    • gimi9.com
    • +4more
    csv, xlsx, xml
    Updated Feb 2, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Information Technology and Innovation Web Team (2017). Website Analytics [Dataset]. https://data.nola.gov/City-Administration/Website-Analytics/62d3-pst8
    Explore at:
    xlsx, csv, xmlAvailable download formats
    Dataset updated
    Feb 2, 2017
    Dataset authored and provided by
    Information Technology and Innovation Web Team
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This data about nola.gov provides a window into how people are interacting with the the City of New Orleans online. The data comes from a unified Google Analytics account for New Orleans. We do not track individuals and we anonymize the IP addresses of all visitors.

  6. d

    Altosight | AI Custom Web Scraping Data | 100% Global | Free Unlimited Data...

    • datarade.ai
    .json, .csv, .xls
    Updated Sep 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Altosight (2024). Altosight | AI Custom Web Scraping Data | 100% Global | Free Unlimited Data Points | Bypassing All CAPTCHAs & Blocking Mechanisms | GDPR Compliant [Dataset]. https://datarade.ai/data-products/altosight-ai-custom-web-scraping-data-100-global-free-altosight
    Explore at:
    .json, .csv, .xlsAvailable download formats
    Dataset updated
    Sep 7, 2024
    Dataset authored and provided by
    Altosight
    Area covered
    Chile, Svalbard and Jan Mayen, Wallis and Futuna, Singapore, Côte d'Ivoire, Greenland, Czech Republic, Tajikistan, Guatemala, Paraguay
    Description

    Altosight | AI Custom Web Scraping Data

    ✦ Altosight provides global web scraping data services with AI-powered technology that bypasses CAPTCHAs, blocking mechanisms, and handles dynamic content.

    We extract data from marketplaces like Amazon, aggregators, e-commerce, and real estate websites, ensuring comprehensive and accurate results.

    ✦ Our solution offers free unlimited data points across any project, with no additional setup costs.

    We deliver data through flexible methods such as API, CSV, JSON, and FTP, all at no extra charge.

    ― Key Use Cases ―

    ➤ Price Monitoring & Repricing Solutions

    🔹 Automatic repricing, AI-driven repricing, and custom repricing rules 🔹 Receive price suggestions via API or CSV to stay competitive 🔹 Track competitors in real-time or at scheduled intervals

    ➤ E-commerce Optimization

    🔹 Extract product prices, reviews, ratings, images, and trends 🔹 Identify trending products and enhance your e-commerce strategy 🔹 Build dropshipping tools or marketplace optimization platforms with our data

    ➤ Product Assortment Analysis

    🔹 Extract the entire product catalog from competitor websites 🔹 Analyze product assortment to refine your own offerings and identify gaps 🔹 Understand competitor strategies and optimize your product lineup

    ➤ Marketplaces & Aggregators

    🔹 Crawl entire product categories and track best-sellers 🔹 Monitor position changes across categories 🔹 Identify which eRetailers sell specific brands and which SKUs for better market analysis

    ➤ Business Website Data

    🔹 Extract detailed company profiles, including financial statements, key personnel, industry reports, and market trends, enabling in-depth competitor and market analysis

    🔹 Collect customer reviews and ratings from business websites to analyze brand sentiment and product performance, helping businesses refine their strategies

    ➤ Domain Name Data

    🔹 Access comprehensive data, including domain registration details, ownership information, expiration dates, and contact information. Ideal for market research, brand monitoring, lead generation, and cybersecurity efforts

    ➤ Real Estate Data

    🔹 Access property listings, prices, and availability 🔹 Analyze trends and opportunities for investment or sales strategies

    ― Data Collection & Quality ―

    ► Publicly Sourced Data: Altosight collects web scraping data from publicly available websites, online platforms, and industry-specific aggregators

    ► AI-Powered Scraping: Our technology handles dynamic content, JavaScript-heavy sites, and pagination, ensuring complete data extraction

    ► High Data Quality: We clean and structure unstructured data, ensuring it is reliable, accurate, and delivered in formats such as API, CSV, JSON, and more

    ► Industry Coverage: We serve industries including e-commerce, real estate, travel, finance, and more. Our solution supports use cases like market research, competitive analysis, and business intelligence

    ► Bulk Data Extraction: We support large-scale data extraction from multiple websites, allowing you to gather millions of data points across industries in a single project

    ► Scalable Infrastructure: Our platform is built to scale with your needs, allowing seamless extraction for projects of any size, from small pilot projects to ongoing, large-scale data extraction

    ― Why Choose Altosight? ―

    ✔ Unlimited Data Points: Altosight offers unlimited free attributes, meaning you can extract as many data points from a page as you need without extra charges

    ✔ Proprietary Anti-Blocking Technology: Altosight utilizes proprietary techniques to bypass blocking mechanisms, including CAPTCHAs, Cloudflare, and other obstacles. This ensures uninterrupted access to data, no matter how complex the target websites are

    ✔ Flexible Across Industries: Our crawlers easily adapt across industries, including e-commerce, real estate, finance, and more. We offer customized data solutions tailored to specific needs

    ✔ GDPR & CCPA Compliance: Your data is handled securely and ethically, ensuring compliance with GDPR, CCPA and other regulations

    ✔ No Setup or Infrastructure Costs: Start scraping without worrying about additional costs. We provide a hassle-free experience with fast project deployment

    ✔ Free Data Delivery Methods: Receive your data via API, CSV, JSON, or FTP at no extra charge. We ensure seamless integration with your systems

    ✔ Fast Support: Our team is always available via phone and email, resolving over 90% of support tickets within the same day

    ― Custom Projects & Real-Time Data ―

    ✦ Tailored Solutions: Every business has unique needs, which is why Altosight offers custom data projects. Contact us for a feasibility analysis, and we’ll design a solution that fits your goals

    ✦ Real-Time Data: Whether you need real-time data delivery or scheduled updates, we provide the flexibility to receive data when you need it. Track price changes, monitor product trends, or gather...

  7. NYC Open Data Plan: Website Data

    • data.cityofnewyork.us
    • res1catalogd-o-tdatad-o-tgov.vcapture.xyz
    • +1more
    application/rdfxml +5
    Updated Oct 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office of Technology and Innovation (OTI) (2024). NYC Open Data Plan: Website Data [Dataset]. https://data.cityofnewyork.us/City-Government/NYC-Open-Data-Plan-Website-Data/duz4-2gn9
    Explore at:
    application/rdfxml, csv, application/rssxml, tsv, xml, jsonAvailable download formats
    Dataset updated
    Oct 28, 2024
    Dataset provided by
    New York City Office of Technology and Innovationhttps://www.nyc.gov/content/oti/pages/
    Authors
    Office of Technology and Innovation (OTI)
    Description

    NOTE: To review the latest plan, make sure to filter the "Report Year" column to the latest year.

    Data on public websites maintained by or on behalf of the city agencies.

  8. d

    1950 Census: Official 1950 Census Website

    • catalog.data.gov
    • datasets.ai
    Updated Mar 11, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office of Innovation (2023). 1950 Census: Official 1950 Census Website [Dataset]. https://catalog.data.gov/dataset/1950-census-official-1950-census-website
    Explore at:
    Dataset updated
    Mar 11, 2023
    Dataset provided by
    Office of Innovation
    Description

    "Website allows the public full access to the 1950 Census images, census maps and descriptions.

  9. f

    Hilco Streambank | Web Hosting & Domain Names | Technology Data

    • datastore.forage.ai
    Updated Nov 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Hilco Streambank | Web Hosting & Domain Names | Technology Data [Dataset]. https://datastore.forage.ai/searchresults/?resource_keyword=web
    Explore at:
    Dataset updated
    Nov 20, 2024
    Description

    Hilco Streambank is a trusted marketplace leader dedicated to reliable and transparent service. As the world's largest IPv4 address broker, Hilco Streambank has successfully completed more transfers than any other organization, worldwide, with over $0 billion generated for clients since 2014. The company's team has extensive experience in region internet registry transfer regulations and provides buyers and sellers with expert advice to help reach a deal that meets even the most complex of needs.

    Hilco Streambank's online marketplace provides a streamlined and transparent process to transfer the rights to IPv4 assets, including buyer and seller checklists, private brokered solutions, and LEASE IPv4 options. The company also offers the IPv4 Analyzer widget and its ReView digital IP address audit tool, a free tool working with 6connect. With operating presence in all five internet registries, including ARIN, APNIC, RIPE, LACNIC, and AFRINIC, Hilco Streambank is well-positioned to facilitate IPv4 transactions worldwide.

  10. w

    Websites using Experian Data Quality

    • webtechsurvey.com
    csv
    Updated Jul 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WebTechSurvey (2025). Websites using Experian Data Quality [Dataset]. https://webtechsurvey.com/technology/experian-data-quality
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jul 3, 2025
    Dataset authored and provided by
    WebTechSurvey
    License

    https://webtechsurvey.com/termshttps://webtechsurvey.com/terms

    Time period covered
    2025
    Area covered
    Global
    Description

    A complete list of live websites using the Experian Data Quality technology, compiled through global website indexing conducted by WebTechSurvey.

  11. w

    Websites using Corona Virus Data

    • webtechsurvey.com
    csv
    Updated Nov 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WebTechSurvey (2024). Websites using Corona Virus Data [Dataset]. https://webtechsurvey.com/technology/corona-virus-data
    Explore at:
    csvAvailable download formats
    Dataset updated
    Nov 16, 2024
    Dataset authored and provided by
    WebTechSurvey
    License

    https://webtechsurvey.com/termshttps://webtechsurvey.com/terms

    Time period covered
    2025
    Area covered
    Global
    Description

    A complete list of live websites using the Corona Virus Data technology, compiled through global website indexing conducted by WebTechSurvey.

  12. n

    Amazon Web Services Public Data Sets

    • neuinfo.org
    • dknet.org
    • +1more
    Updated Jan 29, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Amazon Web Services Public Data Sets [Dataset]. http://identifiers.org/RRID:SCR_006318
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    A multidisciplinary repository of public data sets such as the Human Genome and US Census data that can be seamlessly integrated into AWS cloud-based applications. AWS is hosting the public data sets at no charge for the community. Anyone can access these data sets from their Amazon Elastic Compute Cloud (Amazon EC2) instances and start computing on the data within minutes. Users can also leverage the entire AWS ecosystem and easily collaborate with other AWS users. If you have a public domain or non-proprietary data set that you think is useful and interesting to the AWS community, please submit a request and the AWS team will review your submission and get back to you. Typically the data sets in the repository are between 1 GB to 1 TB in size (based on the Amazon EBS volume limit), but they can work with you to host larger data sets as well. You must have the right to make the data freely available.

  13. Data used for personalization on e-commerce websites U.S. and UK 2020

    • statista.com
    Updated Jul 9, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Data used for personalization on e-commerce websites U.S. and UK 2020 [Dataset]. https://www.statista.com/statistics/1211718/data-personalization-ecommerce-website-us-uk/
    Explore at:
    Dataset updated
    Jul 9, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jul 6, 2020 - Jul 20, 2020
    Area covered
    United Kingdom, United States
    Description

    During a study conducted among e-commerce professionals in the UK and the U.S. in *********, respondents were asked about their use of personalization on their websites. According to the results, ** percent of survey participants were already using real-time behavioral data to personalize user experience on their e-commerce websites.

  14. f

    WP-Script | Web Hosting & Domain Names | Technology Data

    • datastore.forage.ai
    Updated Nov 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). WP-Script | Web Hosting & Domain Names | Technology Data [Dataset]. https://datastore.forage.ai/searchresults/?resource_keyword=web
    Explore at:
    Dataset updated
    Nov 20, 2024
    Description

    WP-Script is a company that provides WordPress themes and plugins for creating adult sites. They offer a range of products, including seven customizable adult WordPress themes and thirteen powerful adult WordPress plugins. Their products are designed to be easy to use and can help entrepreneurs create professional-looking adult sites with minimal technical expertise.

    With WP-Script, you can start your adult site in six easy steps. They also offer a 14-day money-back guarantee, giving you the opportunity to test their products risk-free. Additionally, they provide premium support to help you resolve any issues you may encounter. Their customers love their products, citing excellent themes, easy installation, and good customer support.

  15. w

    Global Data Scraping Tools Market Research Report: By Deployment Mode...

    • wiseguyreports.com
    Updated Jul 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    wWiseguy Research Consultants Pvt Ltd (2024). Global Data Scraping Tools Market Research Report: By Deployment Mode (Cloud, Web, On-Premises), By Data Source (Websites, Social Media, E-commerce Platforms, Databases, Flat Files), By Extraction Type (Structured Data, Semi-Structured Data, Unstructured Data), By Cloud Type (SaaS, PaaS, IaaS), By Application (Market Research, Price Monitoring, Lead Generation, Sentiment Analysis, Data Integration) and By Regional (North America, Europe, South America, Asia Pacific, Middle East and Africa) - Forecast to 2032. [Dataset]. https://www.wiseguyreports.com/reports/data-scraping-tools-market
    Explore at:
    Dataset updated
    Jul 23, 2024
    Dataset authored and provided by
    wWiseguy Research Consultants Pvt Ltd
    License

    https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy

    Time period covered
    Jan 7, 2024
    Area covered
    Global
    Description
    BASE YEAR2024
    HISTORICAL DATA2019 - 2024
    REPORT COVERAGERevenue Forecast, Competitive Landscape, Growth Factors, and Trends
    MARKET SIZE 20233.24(USD Billion)
    MARKET SIZE 20243.73(USD Billion)
    MARKET SIZE 203211.46(USD Billion)
    SEGMENTS COVEREDDeployment Mode ,Data Source ,Extraction Type ,Cloud Type ,Application ,Regional
    COUNTRIES COVEREDNorth America, Europe, APAC, South America, MEA
    KEY MARKET DYNAMICS1 AIpowered data extraction 2 Growing demand for structured data 3 Cloudbased data scraping services 4 Realtime web data extraction 5 Increased use of web scraping for business intelligence
    MARKET FORECAST UNITSUSD Billion
    KEY COMPANIES PROFILEDDexi.io ,Cheerio ,ScrapingBee ,Import.io ,Scrapinghub ,80legs ,Bright Data ,Mozenda ,Phantombuster ,Helium Scraper ,ScraperAPI ,Octoparse ,Apify ,ParseHub ,Diffbot
    MARKET FORECAST PERIOD2024 - 2032
    KEY MARKET OPPORTUNITIESAutomation for efficient data collection Realtime data extraction for enhanced decisionmaking Cloudbased tools for scalability and flexibility AIpowered tools for advanced data analysis Increased demand for web scraping in various industries
    COMPOUND ANNUAL GROWTH RATE (CAGR) 15.06% (2024 - 2032)
  16. d

    Data from: GIS Web Services

    • catalog.data.gov
    • data.brla.gov
    • +2more
    Updated Sep 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.brla.gov (2023). GIS Web Services [Dataset]. https://catalog.data.gov/dataset/gis-web-services
    Explore at:
    Dataset updated
    Sep 15, 2023
    Dataset provided by
    data.brla.gov
    Description

    A listing of web services published from the authoritative East Baton Rouge Parish Geographic Information System (EBRGIS) data repository. Services are offered in Esri REST, and the Open Geospatial Consortium (OGC) Web Mapping Service (WMS) or Web Feature Service (WFS) formats.

  17. w

    Websites using Custom Searchable Data Entry System

    • webtechsurvey.com
    csv
    Updated Jan 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WebTechSurvey (2025). Websites using Custom Searchable Data Entry System [Dataset]. https://webtechsurvey.com/technology/custom-searchable-data-entry-system
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jan 13, 2025
    Dataset authored and provided by
    WebTechSurvey
    License

    https://webtechsurvey.com/termshttps://webtechsurvey.com/terms

    Time period covered
    2025
    Area covered
    Global
    Description

    A complete list of live websites using the Custom Searchable Data Entry System technology, compiled through global website indexing conducted by WebTechSurvey.

  18. Phishing website Detector

    • kaggle.com
    zip
    Updated Feb 28, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eswar Chand (2020). Phishing website Detector [Dataset]. https://www.kaggle.com/eswarchandt/phishing-website-detector
    Explore at:
    zip(89610 bytes)Available download formats
    Dataset updated
    Feb 28, 2020
    Authors
    Eswar Chand
    Description

    Description

    The dataset is a text file which provides the following resources that can be used as inputs for model building :

    1. A collection of website URLs for 11000+ websites. Each sample has 30 website parameters and a class label identifying it as a phishing website or not (1 or -1).

    2. The code template containing these code blocks: a. Import modules (Part 1) b. Load data function + input/output field descriptions

    The dataset also serves as an input for project scoping and tries to specify the functional and non-functional requirements for it.

    Background of Problem Statement :

    You are expected to write the code for a binary classification model (phishing website or not) using Python Scikit-Learn that trains on the data and calculates the accuracy score on the test data. You have to use one or more of the classification algorithms to train a model on the phishing website data set.

    Dataset Description:

    1. The dataset is a “.txt” file with no headers and has only the column values.
    2. The actual column-wise header is described above and, if needed, you can add the header manually.
    3. The header list (column names) is as follows : [ 'UsingIP', 'LongURL', 'ShortURL', 'Symbol@', 'Redirecting//', 'PrefixSuffix-', 'SubDomains', 'HTTPS', 'DomainRegLen', 'Favicon', 'NonStdPort', 'HTTPSDomainURL', 'RequestURL', 'AnchorURL', 'LinksInScriptTags', 'ServerFormHandler', 'InfoEmail', 'AbnormalURL', 'WebsiteForwarding', 'StatusBarCust', 'DisableRightClick', 'UsingPopupWindow', 'IframeRedirection', 'AgeofDomain', 'DNSRecording', 'WebsiteTraffic', 'PageRank', 'GoogleIndex', 'LinksPointingToPage', 'StatsReport', 'class' ] ### Brief Description of the features in data set ● UsingIP (categorical - signed numeric) : { -1,1 } ● LongURL (categorical - signed numeric) : { 1,0,-1 } ● ShortURL (categorical - signed numeric) : { 1,-1 } ● Symbol@ (categorical - signed numeric) : { 1,-1 } ● Redirecting// (categorical - signed numeric) : { -1,1 } ● PrefixSuffix- (categorical - signed numeric) : { -1,1 } ● SubDomains (categorical - signed numeric) : { -1,0,1 } ● HTTPS (categorical - signed numeric) : { -1,1,0 } ● DomainRegLen (categorical - signed numeric) : { -1,1 } ● Favicon (categorical - signed numeric) : { 1,-1 } ● NonStdPort (categorical - signed numeric) : { 1,-1 } ● HTTPSDomainURL (categorical - signed numeric) : { -1,1 } ● RequestURL (categorical - signed numeric) : { 1,-1 } ● AnchorURL (categorical - signed numeric) :
  19. data.ai Website Traffic, Ranking, Analytics [July 2025]

    • semrush.com
    Updated Aug 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Semrush (2025). data.ai Website Traffic, Ranking, Analytics [June 2025] [Dataset]. https://www.semrush.com/website/data.ai/overview/
    Explore at:
    Dataset updated
    Aug 12, 2025
    Dataset authored and provided by
    Semrushhttps://fr.semrush.com/
    License

    https://www.semrush.com/company/legal/terms-of-service/https://www.semrush.com/company/legal/terms-of-service/

    Time period covered
    Aug 12, 2025
    Area covered
    Worldwide
    Variables measured
    visits, backlinks, bounceRate, pagesPerVisit, authorityScore, organicKeywords, avgVisitDuration, referringDomains, trafficByCountry, paidSearchTraffic, and 3 more
    Measurement technique
    Semrush Traffic Analytics; Click-stream data
    Description

    data.ai is ranked #59645 in US with 331.92K Traffic. Categories: . Learn more about website traffic, market share, and more!

  20. d

    Web Activity Data | USA | 5B records | Interests, Demographics | Email MAIDs...

    • datarade.ai
    Updated Nov 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    VisitIQ™ (2024). Web Activity Data | USA | 5B records | Interests, Demographics | Email MAIDs HEMs IP Address | Cookieless [Dataset]. https://datarade.ai/data-products/visitiq-s-web-activity-data-usa-5b-records-interests-visitiq
    Explore at:
    .json, .csv, .xls, .txtAvailable download formats
    Dataset updated
    Nov 15, 2024
    Dataset authored and provided by
    VisitIQ™
    Area covered
    United States
    Description

    Businesses, researchers, and developers often seek out web activity datasets and databases to: Understand consumer behavior. Train machine learning models. Perform market research or competitor analysis. Optimize user experience on websites. Personalize content and advertising. This data can be used for a variety of different use cases

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
PredictLeads (2024). Web Scraping Data | Key Customers Domain Name Data | Scanning Logos found on Websites | 248M+ Records [Dataset]. https://datarade.ai/data-products/predictleads-web-scraping-data-domain-name-data-business-predictleads

Web Scraping Data | Key Customers Domain Name Data | Scanning Logos found on Websites | 248M+ Records

Explore at:
.jsonAvailable download formats
Dataset updated
Jun 27, 2024
Dataset authored and provided by
PredictLeads
Area covered
Northern Mariana Islands, Benin, Colombia, Malaysia, Turkmenistan, Burkina Faso, Nigeria, Svalbard and Jan Mayen, Curaçao, Oman
Description

PredictLeads Key Customers Data provides essential business intelligence by analyzing company relationships, uncovering vendor partnerships, client connections, and strategic affiliations through advanced web scraping and logo recognition. This dataset captures business interactions directly from company websites, offering valuable insights into market positioning, competitive landscapes, and growth opportunities.

Use Cases:

✅ Account Profiling – Gain a 360-degree customer view by mapping company relationships and partnerships. ✅ Competitive Intelligence – Track vendor-client connections and business affiliations to identify key industry players. ✅ B2B Lead Targeting – Prioritize leads based on their business relationships, improving sales and marketing efficiency. ✅ CRM Data Enrichment – Enhance company records with detailed key customer data, ensuring data accuracy. ✅ Market Research – Identify emerging trends and industry networks to optimize strategic planning.

Key API Attributes:

  • id (string, UUID) – Unique identifier for the company connection.
  • category (string) – Type of relationship (e.g., vendor, client, partner).
  • source_category (string) – Where the connection was detected (e.g., partner page, case study).
  • source_url (string, URL) – Website where the relationship was found.
  • individual_source_url (string, URL) – Specific page confirming the connection.
  • context (string) – Extracted description of the business relationship (e.g., "Company X - partners with Company Y to enhance payment processing").
  • first_seen_at (ISO 8601 date-time) – Date the connection was first detected.
  • last_seen_at (ISO 8601 date-time) – Most recent confirmation of the relationship.
  • company1 & company2 (objects) – Details of the two connected companies, including:
  • - domain (string) – Company website domain.
  • - company_name (string) – Official company name.
  • - ticker (string, nullable) – Stock ticker, if available.

📌 PredictLeads Key Customers Data is an indispensable tool for B2B sales, marketing, and market intelligence teams, providing actionable relationship insights to drive targeted outreach, competitor tracking, and strategic decision-making.

PredictLeads Docs: https://docs.predictleads.com/v3/guide/connections_dataset

Search
Clear search
Close search
Google apps
Main menu