PredictLeads Key Customers Data provides essential business intelligence by analyzing company relationships, uncovering vendor partnerships, client connections, and strategic affiliations through advanced web scraping and logo recognition. This dataset captures business interactions directly from company websites, offering valuable insights into market positioning, competitive landscapes, and growth opportunities.
Use Cases:
✅ Account Profiling – Gain a 360-degree customer view by mapping company relationships and partnerships. ✅ Competitive Intelligence – Track vendor-client connections and business affiliations to identify key industry players. ✅ B2B Lead Targeting – Prioritize leads based on their business relationships, improving sales and marketing efficiency. ✅ CRM Data Enrichment – Enhance company records with detailed key customer data, ensuring data accuracy. ✅ Market Research – Identify emerging trends and industry networks to optimize strategic planning.
Key API Attributes:
📌 PredictLeads Key Customers Data is an indispensable tool for B2B sales, marketing, and market intelligence teams, providing actionable relationship insights to drive targeted outreach, competitor tracking, and strategic decision-making.
PredictLeads Docs: https://docs.predictleads.com/v3/guide/connections_dataset
Altosight | AI Custom Web Scraping Data
✦ Altosight provides global web scraping data services with AI-powered technology that bypasses CAPTCHAs, blocking mechanisms, and handles dynamic content.
We extract data from marketplaces like Amazon, aggregators, e-commerce, and real estate websites, ensuring comprehensive and accurate results.
✦ Our solution offers free unlimited data points across any project, with no additional setup costs.
We deliver data through flexible methods such as API, CSV, JSON, and FTP, all at no extra charge.
― Key Use Cases ―
➤ Price Monitoring & Repricing Solutions
🔹 Automatic repricing, AI-driven repricing, and custom repricing rules 🔹 Receive price suggestions via API or CSV to stay competitive 🔹 Track competitors in real-time or at scheduled intervals
➤ E-commerce Optimization
🔹 Extract product prices, reviews, ratings, images, and trends 🔹 Identify trending products and enhance your e-commerce strategy 🔹 Build dropshipping tools or marketplace optimization platforms with our data
➤ Product Assortment Analysis
🔹 Extract the entire product catalog from competitor websites 🔹 Analyze product assortment to refine your own offerings and identify gaps 🔹 Understand competitor strategies and optimize your product lineup
➤ Marketplaces & Aggregators
🔹 Crawl entire product categories and track best-sellers 🔹 Monitor position changes across categories 🔹 Identify which eRetailers sell specific brands and which SKUs for better market analysis
➤ Business Website Data
🔹 Extract detailed company profiles, including financial statements, key personnel, industry reports, and market trends, enabling in-depth competitor and market analysis
🔹 Collect customer reviews and ratings from business websites to analyze brand sentiment and product performance, helping businesses refine their strategies
➤ Domain Name Data
🔹 Access comprehensive data, including domain registration details, ownership information, expiration dates, and contact information. Ideal for market research, brand monitoring, lead generation, and cybersecurity efforts
➤ Real Estate Data
🔹 Access property listings, prices, and availability 🔹 Analyze trends and opportunities for investment or sales strategies
― Data Collection & Quality ―
► Publicly Sourced Data: Altosight collects web scraping data from publicly available websites, online platforms, and industry-specific aggregators
► AI-Powered Scraping: Our technology handles dynamic content, JavaScript-heavy sites, and pagination, ensuring complete data extraction
► High Data Quality: We clean and structure unstructured data, ensuring it is reliable, accurate, and delivered in formats such as API, CSV, JSON, and more
► Industry Coverage: We serve industries including e-commerce, real estate, travel, finance, and more. Our solution supports use cases like market research, competitive analysis, and business intelligence
► Bulk Data Extraction: We support large-scale data extraction from multiple websites, allowing you to gather millions of data points across industries in a single project
► Scalable Infrastructure: Our platform is built to scale with your needs, allowing seamless extraction for projects of any size, from small pilot projects to ongoing, large-scale data extraction
― Why Choose Altosight? ―
✔ Unlimited Data Points: Altosight offers unlimited free attributes, meaning you can extract as many data points from a page as you need without extra charges
✔ Proprietary Anti-Blocking Technology: Altosight utilizes proprietary techniques to bypass blocking mechanisms, including CAPTCHAs, Cloudflare, and other obstacles. This ensures uninterrupted access to data, no matter how complex the target websites are
✔ Flexible Across Industries: Our crawlers easily adapt across industries, including e-commerce, real estate, finance, and more. We offer customized data solutions tailored to specific needs
✔ GDPR & CCPA Compliance: Your data is handled securely and ethically, ensuring compliance with GDPR, CCPA and other regulations
✔ No Setup or Infrastructure Costs: Start scraping without worrying about additional costs. We provide a hassle-free experience with fast project deployment
✔ Free Data Delivery Methods: Receive your data via API, CSV, JSON, or FTP at no extra charge. We ensure seamless integration with your systems
✔ Fast Support: Our team is always available via phone and email, resolving over 90% of support tickets within the same day
― Custom Projects & Real-Time Data ―
✦ Tailored Solutions: Every business has unique needs, which is why Altosight offers custom data projects. Contact us for a feasibility analysis, and we’ll design a solution that fits your goals
✦ Real-Time Data: Whether you need real-time data delivery or scheduled updates, we provide the flexibility to receive data when you need it. Track price changes, monitor product trends, or gather...
https://webtechsurvey.com/termshttps://webtechsurvey.com/terms
A complete list of live websites using the data-urls technology, compiled through global website indexing conducted by WebTechSurvey.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
This Website Statistics dataset has four resources showing usage of the Lincolnshire Open Data website. Web analytics terms used in each resource are defined in their accompanying Metadata file.
Website Usage Statistics: This document shows a statistical summary of usage of the Lincolnshire Open Data site for the latest calendar year.
Website Statistics Summary: This dataset shows a website statistics summary for the Lincolnshire Open Data site for the latest calendar year.
Webpage Statistics: This dataset shows statistics for individual Webpages on the Lincolnshire Open Data site by calendar year.
Dataset Statistics: This dataset shows cumulative totals for Datasets on the Lincolnshire Open Data site that have also been published on the national Open Data site Data.Gov.UK - see the Source link.
Note: Website and Webpage statistics (the first three resources above) show only UK users, and exclude API calls (automated requests for datasets). The Dataset Statistics are confined to users with javascript enabled, which excludes web crawlers and API calls.
These Website Statistics resources are updated annually in January by the Lincolnshire County Council Business Intelligence team. For any enquiries about the information contact opendata@lincolnshire.gov.uk.
Hilco Streambank is a trusted marketplace leader dedicated to reliable and transparent service. As the world's largest IPv4 address broker, Hilco Streambank has successfully completed more transfers than any other organization, worldwide, with over $0 billion generated for clients since 2014. The company's team has extensive experience in region internet registry transfer regulations and provides buyers and sellers with expert advice to help reach a deal that meets even the most complex of needs.
Hilco Streambank's online marketplace provides a streamlined and transparent process to transfer the rights to IPv4 assets, including buyer and seller checklists, private brokered solutions, and LEASE IPv4 options. The company also offers the IPv4 Analyzer widget and its ReView digital IP address audit tool, a free tool working with 6connect. With operating presence in all five internet registries, including ARIN, APNIC, RIPE, LACNIC, and AFRINIC, Hilco Streambank is well-positioned to facilitate IPv4 transactions worldwide.
NOTE: To review the latest plan, make sure to filter the "Report Year" column to the latest year.
Data on public websites maintained by or on behalf of the city agencies.
https://data.gov.tw/licensehttps://data.gov.tw/license
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The dataset has been collected in the frame of the Prac1 of the subject Tipology and Data Life Cycle of the Master's Degree in Data Science of the Universitat Oberta de Catalunya (UOC).
The dataset contains 25 variables and 52478 records corresponding to books on the GoodReads Best Books Ever list (the larges list on the site).
Original code used to retrieve the dataset can be found on github repository: github.com/scostap/goodreads_bbe_dataset
The data was retrieved in two sets, the first 30000 books and then the remainig 22478. Dates were not parsed and reformated on the second chunk so publishDate and firstPublishDate are representet in a mm/dd/yyyy format for the first 30000 records and Month Day Year for the rest.
Book cover images can be optionally downloaded from the url in the 'coverImg' field. Python code for doing so and an example can be found on the github repo.
The 25 fields of the dataset are:
| Attributes | Definition | Completeness |
| ------------- | ------------- | ------------- |
| bookId | Book Identifier as in goodreads.com | 100 |
| title | Book title | 100 |
| series | Series Name | 45 |
| author | Book's Author | 100 |
| rating | Global goodreads rating | 100 |
| description | Book's description | 97 |
| language | Book's language | 93 |
| isbn | Book's ISBN | 92 |
| genres | Book's genres | 91 |
| characters | Main characters | 26 |
| bookFormat | Type of binding | 97 |
| edition | Type of edition (ex. Anniversary Edition) | 9 |
| pages | Number of pages | 96 |
| publisher | Editorial | 93 |
| publishDate | publication date | 98 |
| firstPublishDate | Publication date of first edition | 59 |
| awards | List of awards | 20 |
| numRatings | Number of total ratings | 100 |
| ratingsByStars | Number of ratings by stars | 97 |
| likedPercent | Derived field, percent of ratings over 2 starts (as in GoodReads) | 99 |
| setting | Story setting | 22 |
| coverImg | URL to cover image | 99 |
| bbeScore | Score in Best Books Ever list | 100 |
| bbeVotes | Number of votes in Best Books Ever list | 100 |
| price | Book's price (extracted from Iberlibro) | 73 |
Data from the State of California. From website:
Access raw State data files, databases, geographic data, and other data sources. Raw State data files can be reused by citizens and organizations for their own web applications and mashups.
Open. Effectively in the public domain. Terms of use page says:
In general, information presented on this web site, unless otherwise indicated, is considered in the public domain. It may be distributed or copied as permitted by law. However, the State does make use of copyrighted data (e.g., photographs) which may require additional permissions prior to your use. In order to use any information on this web site not owned or created by the State, you must seek permission directly from the owning (or holding) sources. The State shall have the unlimited right to use for any purpose, free of any charge, all information submitted via this site except those submissions made under separate legal contract. The State shall be free to use, for any purpose, any ideas, concepts, or techniques contained in information provided through this site.
This asset includes Superfund site-specific sampling information including location of samples, types of samples, and analytical chemistry characteristics of samples. Information is associated with a particular contaminated sate as there is no national database of this information.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset covers all relevant information on every Afrotropical moth species. The zoogeographic area covered can be defined as the Africa continent south of the Sahara (i.e. excl. Morocco, Algeria, Tunisia, Libya and Egypt), the islands in the Atlantic Ocean: Amsterdam Island, Ascension, Cape Verde Archipelago, Inaccessible Island, St. Helena, São Tomé and Principe, Tristan da Cunha, and the islands in the Indian Ocean: Comores (Anjouan, Grande Comore, Mayotte, Mohéli), Madagascar, Mascarene Islands (La Réunion, Mauritius, Rodrigues), Seychelles (Félicité, Mahé, Praslin, Silhouette, a.o.). Furthermore, also those moth species occurring in the transition zone to the Palaearctic fauna have been included, namely most of the Arabia Peninsula (Kuwait, Oman, Saudi Arabia, United Arab Emirates, Yemen with Socotra) but not Iraq, Jordan and further north. Also, some Saharan species have been included (e. g. Hoggar Mts. in Algeria, Tibesti Mts. in South Libya). Utmost care was taken that the data incorporated in the database are correct. We decline any responsibility in case of damage to soft- or hardware based on information used in this website. Persons retrieving information from this website for their own research or for applied aspects such as pest control programmes, should acknowledge the usage of data from this website in the following format: De Prins, J. & De Prins, W. 2011. Afromoths, online database of Afrotropical moth species (Lepidoptera). World Wide Web electronic publication (www.afromoths.net)
Comprehensive dataset of 540 Website designers in Louisiana, United States as of July, 2025. Includes verified contact information (email, phone), geocoded addresses, customer ratings, reviews, business categories, and operational details. Perfect for market research, lead generation, competitive analysis, and business intelligence. Download a complimentary sample to evaluate data quality and completeness.
https://spectrum.library.concordia.ca/policies.html#TermsOfAccesshttps://spectrum.library.concordia.ca/policies.html#TermsOfAccess
This book is the result of teaching the laboratory component of an introductory course in Database Systems in the Department of Computer Science & Software Engineering, Concordia University, Montreal.. The intent of this part of the course was to have the students create a practical web-based application wherein the database forms the dynamic component of a real life application using a web browser as the user interface.
It was decided to use all open source software, namely, Apache web server, PHP, JavaScript and HTML, and also the open source database which started as MySQL and has since migrated to MariaDB.
The examples given in this book have been run successfully both using MySQL on a Windows platform and MariaDB on a Linux platform without any changes. However, the code may need to be updated as the underlying software systems evolve with time, as functions are deprecated and replaced by others. Hence the user is responsible for making any required changes to any code given in this book.
The readers are also warned of the changing privacy and data usage policy of most web sites. They should be aware that most web sites collect and mine user’s data for private profit.
The authors wish to acknowledge the contribution of many students in the introductory database course over the years whose needs and the involvement of one of the authors in the early days of the web prompted the start of this project in the late part of the 20th century. This was the era of dot com bubble
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This data about nola.gov provides a window into how people are interacting with the the City of New Orleans online. The data comes from a unified Google Analytics account for New Orleans. We do not track individuals and we anonymize the IP addresses of all visitors.
https://www.factori.ai/privacy-policyhttps://www.factori.ai/privacy-policy
We provide detailed web activity data from users browsing popular websites worldwide. This comprehensive data allows for in-depth analysis of web behavior, enabling the creation of precise audience segments based on web activity. These segments can be used to target ads effectively, focusing on users' interests and their search or browsing intent.
Our web data reach includes extensive counts across various categories, covering attributes such as country, anonymous ID, IP addresses, search queries, and more.
We dynamically collect and update data, providing the latest insights through the most appropriate method at intervals that best suit your needs, whether daily, weekly, or monthly.
Our web activity data is instrumental for personalized targeting, data enrichment, market intelligence, and enhancing fraud and cybersecurity measures, helping businesses optimize their strategies and security efforts.
Comprehensive dataset of 16,048 Website designers in Netherlands as of July, 2025. Includes verified contact information (email, phone), geocoded addresses, customer ratings, reviews, business categories, and operational details. Perfect for market research, lead generation, competitive analysis, and business intelligence. Download a complimentary sample to evaluate data quality and completeness.
Public Domain Mark 1.0https://creativecommons.org/publicdomain/mark/1.0/
License information was derived automatically
The Sea Around Us is a research initiative at The University of British Columbia (located at the Institute for the Oceans and Fisheries, formerly Fisheries Centre) that assesses the impact of fisheries on the marine ecosystems of the world, and offers mitigating solutions to a range of stakeholders.
The Sea Around Us was initiated in collaboration with The Pew Charitable Trusts in 1999, and in 2014, the Sea Around Us also began a collaboration with The Paul G. Allen Family Foundation to provide African and Asian countries with more accurate and comprehensive fisheries data.
The Sea Around Us provides data and analyses through View Data, articles in peer-reviewed journals, and other media (News). The Sea Around Us regularly update products at the scale of countries’ Exclusive Economic Zones, Large Marine Ecosystems, the High Seas and other spatial scales, and as global maps and summaries.
The Sea Around Us emphasizes catch time series starting in 1950, and related series (e.g., landed value and catch by flag state, fishing sector and catch type), and fisheries-related information on every maritime country (e.g., government subsidies, marine biodiversity). Information is also offered on sub-projects, e.g., the historic expansion of fisheries, the performance of Regional Fisheries Management Organizations, or the likely impact of climate change on fisheries.
The information and data presented on their website is freely available to any user, granted that its source is acknowledged. The Sea Around Us is aware that this information may be incomplete. Please let them know about this via the feedback options available on this website.
If you cite or display any content from the Site, or reference the Sea Around Us, the Sea Around Us – Indian Ocean, the University of British Columbia or the University of Western Australia, in any format, written or otherwise, including print or web publications, presentations, grant applications, websites, other online applications such as blogs, or other works, you must provide appropriate acknowledgement using a citation consistent with the following standard:
When referring to various datasets downloaded from the website, and/or its concept or design, or to several datasets extracted from its underlying databases, cite its architects. Example: Pauly D., Zeller D., Palomares M.L.D. (Editors), 2020. Sea Around Us Concepts, Design and Data (seaaroundus.org).
When referring to a set of values extracted for a given country, EEZ or territory, cite the most recent catch reconstruction report or paper (available on the website) for that country, EEZ or territory. Example: For the Mexican Pacific EEZ, the citation should be “Cisneros-Montemayor AM, Cisneros-Mata MA, Harper S and Pauly D (2015) Unreported marine fisheries catch in Mexico, 1950-2010. Fisheries Centre Working Paper #2015-22, University of British Columbia, Vancouver. 9 p.”, which is accessible on the EEZ page for Mexico (Pacific) on seaaroundus.org.
To help us track the use of Sea Around Us data, we would appreciate you also citing Pauly, Zeller, and Palomares (2020) as the source of the information in an appropriate part of your text;
When using data from our website that are not part of a typical catch reconstruction (e.g., catches by LME or other spatial entity, subsidies given to fisheries, the estuaries in a given country, or the surface area of a given EEZ), cite both the website and the study that generated the underlying database. Many of these can be derived from the ’methods’ texts associated with data pages on seaaroundus.org. Example: Sumaila et al. (2010) for subsides, Alder (2003) for estuaries and Claus et al. (2014) for EEZ delineations, respectively.
The Sea Around Us data are (where not otherwise regulated) under a Creative Commons Attribution Non-Commercial 4.0 International License (https://creativecommons.org/licenses/by-nc/4.0/). Notices regarding copyrights (© The University of British Columbia), license and disclaimer can be found under http://www.seaaroundus.org/terms-and-conditions/. References:
Alder J (2003) Putting the coast in the Sea Around Us Project. The Sea Around Us Newsletter (15): 1-2.
Cisneros-Montemayor AM, Cisneros-Mata MA, Harper S and Pauly D (2015) Unreported marine fisheries catch in Mexico, 1950-2010. Fisheries Centre Working Paper #2015-22, University of British Columbia, Vancouver. 9 p.
Pauly D, Zeller D, and Palomares M.L.D. (Editors) (2020) Sea Around Us Concepts, Design and Data (www.seaaroundus.org)
Claus S, De Hauwere N, Vanhoorne B, Deckers P, Souza Dias F, Hernandez F and Mees J (2014) Marine Regions: Towards a global standard for georeferenced marine names and boundaries. Marine Geodesy 37(2): 99-125.
Sumaila UR, Khan A, Dyck A, Watson R, Munro R, Tydemers P and Pauly D (2010) A bottom-up re-estimation of global fisheries subsidies. Journal of Bioeconomics 12: 201-225.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Code:
Packet_Features_Generator.py & Features.py
To run this code:
pkt_features.py [-h] -i TXTFILE [-x X] [-y Y] [-z Z] [-ml] [-s S] -j
-h, --help show this help message and exit -i TXTFILE input text file -x X Add first X number of total packets as features. -y Y Add first Y number of negative packets as features. -z Z Add first Z number of positive packets as features. -ml Output to text file all websites in the format of websiteNumber1,feature1,feature2,... -s S Generate samples using size s. -j
Purpose:
Turns a text file containing lists of incomeing and outgoing network packet sizes into separate website objects with associative features.
Uses Features.py to calcualte the features.
startMachineLearning.sh & machineLearning.py
To run this code:
bash startMachineLearning.sh
This code then runs machineLearning.py in a tmux session with the nessisary file paths and flags
Options (to be edited within this file):
--evaluate-only to test 5 fold cross validation accuracy
--test-scaling-normalization to test 6 different combinations of scalers and normalizers
Note: once the best combination is determined, it should be added to the data_preprocessing function in machineLearning.py for future use
--grid-search to test the best grid search hyperparameters - note: the possible hyperparameters must be added to train_model under 'if not evaluateOnly:' - once best hyperparameters are determined, add them to train_model under 'if evaluateOnly:'
Purpose:
Using the .ml file generated by Packet_Features_Generator.py & Features.py, this program trains a RandomForest Classifier on the provided data and provides results using cross validation. These results include the best scaling and normailzation options for each data set as well as the best grid search hyperparameters based on the provided ranges.
Data
Encrypted network traffic was collected on an isolated computer visiting different Wikipedia and New York Times articles, different Google search queres (collected in the form of their autocomplete results and their results page), and different actions taken on a Virtual Reality head set.
Data for this experiment was stored and analyzed in the form of a txt file for each experiment which contains:
First number is a classification number to denote what website, query, or vr action is taking place.
The remaining numbers in each line denote:
The size of a packet,
and the direction it is traveling.
negative numbers denote incoming packets
positive numbers denote outgoing packets
Figure 4 Data
This data uses specific lines from the Virtual Reality.txt file.
The action 'LongText Search' refers to a user searching for "Saint Basils Cathedral" with text in the Wander app.
The action 'ShortText Search' refers to a user searching for "Mexico" with text in the Wander app.
The .xlsx and .csv file are identical
Each file includes (from right to left):
The origional packet data,
each line of data organized from smallest to largest packet size in order to calculate the mean and standard deviation of each packet capture,
and the final Cumulative Distrubution Function (CDF) caluclation that generated the Figure 4 Graph.
Comprehensive dataset of 33 Website designers in Ávila, Spain as of June, 2025. Includes verified contact information (email, phone), geocoded addresses, customer ratings, reviews, business categories, and operational details. Perfect for market research, lead generation, competitive analysis, and business intelligence. Download a complimentary sample to evaluate data quality and completeness.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The O*NET Database contains hundreds of standardized and occupation-specific descriptors on almost 1,000 occupations covering the entire U.S. economy. The database, which is available to the public at no cost, is continually updated by a multi-method data collection program. Sources of data include: job incumbents, occupational experts, occupational analysts, employer job postings, and customer/professional association input.
Data content areas include:
PredictLeads Key Customers Data provides essential business intelligence by analyzing company relationships, uncovering vendor partnerships, client connections, and strategic affiliations through advanced web scraping and logo recognition. This dataset captures business interactions directly from company websites, offering valuable insights into market positioning, competitive landscapes, and growth opportunities.
Use Cases:
✅ Account Profiling – Gain a 360-degree customer view by mapping company relationships and partnerships. ✅ Competitive Intelligence – Track vendor-client connections and business affiliations to identify key industry players. ✅ B2B Lead Targeting – Prioritize leads based on their business relationships, improving sales and marketing efficiency. ✅ CRM Data Enrichment – Enhance company records with detailed key customer data, ensuring data accuracy. ✅ Market Research – Identify emerging trends and industry networks to optimize strategic planning.
Key API Attributes:
📌 PredictLeads Key Customers Data is an indispensable tool for B2B sales, marketing, and market intelligence teams, providing actionable relationship insights to drive targeted outreach, competitor tracking, and strategic decision-making.
PredictLeads Docs: https://docs.predictleads.com/v3/guide/connections_dataset