76 datasets found

d
Web Scraping Data | Key Customers Domain Name Data | Scanning Logos found on...
datarade.ai
.json
Updated Jun 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
PredictLeads (2024). Web Scraping Data | Key Customers Domain Name Data | Scanning Logos found on Websites | 222M+ Records [Dataset]. https://datarade.ai/data-products/predictleads-web-scraping-data-domain-name-data-business-predictleads
Explore at:
.jsonAvailable download formats
Dataset updated
Jun 27, 2024
Dataset authored and provided by
PredictLeads
Area covered
Malaysia, Northern Mariana Islands, Turkmenistan, Colombia, Benin, Burkina Faso, Oman, Nigeria, Svalbard and Jan Mayen, Curaçao
Description
PredictLeads Key Customers Data provides essential business intelligence by analyzing company relationships, uncovering vendor partnerships, client connections, and strategic affiliations through advanced web scraping and logo recognition. This dataset captures business interactions directly from company websites, offering valuable insights into market positioning, competitive landscapes, and growth opportunities.

Use Cases:

✅ Account Profiling – Gain a 360-degree customer view by mapping company relationships and partnerships. ✅ Competitive Intelligence – Track vendor-client connections and business affiliations to identify key industry players. ✅ B2B Lead Targeting – Prioritize leads based on their business relationships, improving sales and marketing efficiency. ✅ CRM Data Enrichment – Enhance company records with detailed key customer data, ensuring data accuracy. ✅ Market Research – Identify emerging trends and industry networks to optimize strategic planning.

Key API Attributes:

id (string, UUID) – Unique identifier for the company connection.

category (string) – Type of relationship (e.g., vendor, client, partner).

source_category (string) – Where the connection was detected (e.g., partner page, case study).

source_url (string, URL) – Website where the relationship was found.

individual_source_url (string, URL) – Specific page confirming the connection.

context (string) – Extracted description of the business relationship (e.g., "Company X - partners with Company Y to enhance payment processing").

first_seen_at (ISO 8601 date-time) – Date the connection was first detected.

last_seen_at (ISO 8601 date-time) – Most recent confirmation of the relationship.

company1 & company2 (objects) – Details of the two connected companies, including:

- domain (string) – Company website domain.

- company_name (string) – Official company name.

- ticker (string, nullable) – Stock ticker, if available.

📌 PredictLeads Key Customers Data is an indispensable tool for B2B sales, marketing, and market intelligence teams, providing actionable relationship insights to drive targeted outreach, competitor tracking, and strategic decision-making.

API Example: https://docs.predictleads.com/v3/guide/connections_dataset/data_model
d
Altosight | AI Custom Web Scraping Data | 100% Global | Free Unlimited Data...
datarade.ai
.json, .csv, .xls
Updated Sep 7, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Altosight (2024). Altosight | AI Custom Web Scraping Data | 100% Global | Free Unlimited Data Points | Bypassing All CAPTCHAs & Blocking Mechanisms | GDPR Compliant [Dataset]. https://datarade.ai/data-products/altosight-ai-custom-web-scraping-data-100-global-free-altosight
Explore at:
.json, .csv, .xlsAvailable download formats
Dataset updated
Sep 7, 2024
Dataset authored and provided by
Altosight
Area covered
Wallis and Futuna, Paraguay, Tajikistan, Chile, Svalbard and Jan Mayen, Guatemala, Singapore, Côte d'Ivoire, Greenland, Czech Republic
Description
Altosight | AI Custom Web Scraping Data

✦ Altosight provides global web scraping data services with AI-powered technology that bypasses CAPTCHAs, blocking mechanisms, and handles dynamic content.

We extract data from marketplaces like Amazon, aggregators, e-commerce, and real estate websites, ensuring comprehensive and accurate results.

✦ Our solution offers free unlimited data points across any project, with no additional setup costs.

We deliver data through flexible methods such as API, CSV, JSON, and FTP, all at no extra charge.

― Key Use Cases ―

➤ Price Monitoring & Repricing Solutions

🔹 Automatic repricing, AI-driven repricing, and custom repricing rules 🔹 Receive price suggestions via API or CSV to stay competitive 🔹 Track competitors in real-time or at scheduled intervals

➤ E-commerce Optimization

🔹 Extract product prices, reviews, ratings, images, and trends 🔹 Identify trending products and enhance your e-commerce strategy 🔹 Build dropshipping tools or marketplace optimization platforms with our data

➤ Product Assortment Analysis

🔹 Extract the entire product catalog from competitor websites 🔹 Analyze product assortment to refine your own offerings and identify gaps 🔹 Understand competitor strategies and optimize your product lineup

➤ Marketplaces & Aggregators

🔹 Crawl entire product categories and track best-sellers 🔹 Monitor position changes across categories 🔹 Identify which eRetailers sell specific brands and which SKUs for better market analysis

➤ Business Website Data

🔹 Extract detailed company profiles, including financial statements, key personnel, industry reports, and market trends, enabling in-depth competitor and market analysis

🔹 Collect customer reviews and ratings from business websites to analyze brand sentiment and product performance, helping businesses refine their strategies

➤ Domain Name Data

🔹 Access comprehensive data, including domain registration details, ownership information, expiration dates, and contact information. Ideal for market research, brand monitoring, lead generation, and cybersecurity efforts

➤ Real Estate Data

🔹 Access property listings, prices, and availability 🔹 Analyze trends and opportunities for investment or sales strategies

― Data Collection & Quality ―

► Publicly Sourced Data: Altosight collects web scraping data from publicly available websites, online platforms, and industry-specific aggregators

► AI-Powered Scraping: Our technology handles dynamic content, JavaScript-heavy sites, and pagination, ensuring complete data extraction

► High Data Quality: We clean and structure unstructured data, ensuring it is reliable, accurate, and delivered in formats such as API, CSV, JSON, and more

► Industry Coverage: We serve industries including e-commerce, real estate, travel, finance, and more. Our solution supports use cases like market research, competitive analysis, and business intelligence

► Bulk Data Extraction: We support large-scale data extraction from multiple websites, allowing you to gather millions of data points across industries in a single project

► Scalable Infrastructure: Our platform is built to scale with your needs, allowing seamless extraction for projects of any size, from small pilot projects to ongoing, large-scale data extraction

― Why Choose Altosight? ―

✔ Unlimited Data Points: Altosight offers unlimited free attributes, meaning you can extract as many data points from a page as you need without extra charges

✔ Proprietary Anti-Blocking Technology: Altosight utilizes proprietary techniques to bypass blocking mechanisms, including CAPTCHAs, Cloudflare, and other obstacles. This ensures uninterrupted access to data, no matter how complex the target websites are

✔ Flexible Across Industries: Our crawlers easily adapt across industries, including e-commerce, real estate, finance, and more. We offer customized data solutions tailored to specific needs

✔ GDPR & CCPA Compliance: Your data is handled securely and ethically, ensuring compliance with GDPR, CCPA and other regulations

✔ No Setup or Infrastructure Costs: Start scraping without worrying about additional costs. We provide a hassle-free experience with fast project deployment

✔ Free Data Delivery Methods: Receive your data via API, CSV, JSON, or FTP at no extra charge. We ensure seamless integration with your systems

✔ Fast Support: Our team is always available via phone and email, resolving over 90% of support tickets within the same day

― Custom Projects & Real-Time Data ―

✦ Tailored Solutions: Every business has unique needs, which is why Altosight offers custom data projects. Contact us for a feasibility analysis, and we’ll design a solution that fits your goals

✦ Real-Time Data: Whether you need real-time data delivery or scheduled updates, we provide the flexibility to receive data when you need it. Track price changes, monitor product trends, or gather...
A web tracking data set of online browsing behavior of 2,148 users
zenodo.org
explore.openaire.eu
+1more
application/gzip, txt +1
Updated May 14, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Juhi Kulshrestha; Juhi Kulshrestha; Marcos Oliveira; Marcos Oliveira; Orkut Karacalik; Denis Bonnay; Claudia Wagner; Orkut Karacalik; Denis Bonnay; Claudia Wagner (2021). A web tracking data set of online browsing behavior of 2,148 users [Dataset]. http://doi.org/10.5281/zenodo.4757574
Explore at:
zip, txt, application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4757574
Dataset updated
May 14, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Juhi Kulshrestha; Juhi Kulshrestha; Marcos Oliveira; Marcos Oliveira; Orkut Karacalik; Denis Bonnay; Claudia Wagner; Orkut Karacalik; Denis Bonnay; Claudia Wagner
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
This anonymized data set consists of one month's (October 2018) web tracking data of 2,148 German users. For each user, the data contains the anonymized URL of the webpage the user visited, the domain of the webpage, category of the domain, which provides 41 distinct categories. In total, these 2,148 users made 9,151,243 URL visits, spanning 49,918 unique domains. For each user in our data set, we have self-reported information (collected via a survey) about their gender and age.

We acknowledge the support of Respondi AG, which provided the web tracking and survey data free of charge for research purposes, with special thanks to François Erner and Luc Kalaora at Respondi for their insights and help with data extraction.

The data set is analyzed in the following paper:

Kulshrestha, J., Oliveira, M., Karacalik, O., Bonnay, D., Wagner, C. "Web Routineness and Limits of Predictability: Investigating Demographic and Behavioral Differences Using Web Tracking Data." Proceedings of the International AAAI Conference on Web and Social Media. 2021. https://arxiv.org/abs/2012.15112.

The code used to analyze the data is also available at https://github.com/gesiscss/web_tracking.

If you use data or code from this repository, please cite the paper above and the Zenodo link.
D
Top syndicated pages from CDC.gov by weekly page views
data.cdc.gov
data.virginia.gov
+3more
application/rdfxml +5
Updated May 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office of the Associate Director for Communication, Division of News and Electronic Media (2025). Top syndicated pages from CDC.gov by weekly page views [Dataset]. https://data.cdc.gov/w/rppv-wbiv/tdwk-ruhb?cur=b3roiUNpULG
Explore at:
application/rdfxml, tsv, application/rssxml, csv, xml, jsonAvailable download formats
Dataset updated
May 27, 2025
Dataset authored and provided by
Office of the Associate Director for Communication, Division of News and Electronic Media
Description
The CDC Content Syndication site at https://tools.cdc.gov/syndication/ allows you to import content from CDC websites directly into your own website or application. These services are provided free of charge from CDC. The data shown in this table represent the weekly top page views from CDC.gov offered by syndication.
Z
Data from: Structural Profiling of Web Sites in the Wild
data.niaid.nih.gov
zenodo.org
Updated Jun 10, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hallé, Sylvain (2020). Structural Profiling of Web Sites in the Wild [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3718597
Explore at:
Dataset updated
Jun 10, 2020
Dataset provided by
Chamberland-Thibeault, Xavier
Hallé, Sylvain
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset contains and processes results of a large-scale survey of 708 websites, made in December 2019, in order to measure various features related to their size and structure: DOM tree size, maximum degree, depth, diversity of element types and CSS classes, among others. The goal of this research is to serve as a reference point for studies that include an empirical evaluation on samples of web pages.

See the Readme.md file inside the archive for more details about its contents.
h
Web2Code
huggingface.co
Updated Oct 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohamed Bin Zayed University of Artificial Intelligence (2024). Web2Code [Dataset]. https://huggingface.co/datasets/MBZUAI/Web2Code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 22, 2024
Dataset authored and provided by
Mohamed Bin Zayed University of Artificial Intelligence
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset Details

Our Web2Code instruction tuning dataset construction and instruction generation process involves four key components: (1) Creation of new webpage image-code pair data: We generated high-quality HTML webpage-code pairs following the CodeAlpaca prompt using GPT-3.5 and convert them into instruction-following data. (2) Refinement of existing webpage code generation data: We transform existing datasets including into an instruction-following data format similar to LLaVA… See the full description on the dataset page: https://huggingface.co/datasets/MBZUAI/Web2Code.
Data from: ArcGIS Enterprise
margig-edt.hub.arcgis.com
Updated May 2, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Esri European National Government Team (2019). ArcGIS Enterprise [Dataset]. https://margig-edt.hub.arcgis.com/datasets/arcgis-enterprise
Explore at:
Dataset updated
May 2, 2019
Dataset provided by
Esrihttp://esri.com/
Authors
Esri European National Government Team
Description
ArcGIS Enterprise puts collaboration and flexibility at the center of your organization's GIS. It pairs industry-leading mapping and analytics capabilities with a dedicated Web GIS infrastructure to organize and share your work on any device, anywhere, at any time.
f
Iris Webpage
figshare.com
html
Updated Mar 9, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jesus Rogel-Salazar (2020). Iris Webpage [Dataset]. http://doi.org/10.6084/m9.figshare.7053392.v4
Explore at:
htmlAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.7053392.v4
Dataset updated
Mar 9, 2020
Dataset provided by
figshare
Authors
Jesus Rogel-Salazar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A simple web page containing Fisher's Iris Dataset.
W
Webpage Tamper-Proof Report
marketresearchforecast.com
doc, pdf, ppt
Updated Mar 8, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Research Forecast (2025). Webpage Tamper-Proof Report [Dataset]. https://www.marketresearchforecast.com/reports/webpage-tamper-proof-30228
Explore at:
pdf, doc, pptAvailable download formats
Dataset updated
Mar 8, 2025
Dataset authored and provided by
Market Research Forecast
License
https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The global webpage tamper-proof market is experiencing robust growth, driven by increasing concerns over data integrity and security breaches across various industries. The rising adoption of cloud-based solutions and the expanding digital footprint of SMEs and large enterprises are key catalysts. While the precise market size in 2025 is not provided, considering a plausible CAGR of 15% (a reasonable estimate based on cybersecurity market growth trends) and assuming a 2024 market size of $500 million (a conservative estimate given the presence of numerous established and emerging players), the 2025 market size could be estimated at approximately $575 million. This growth is further fueled by the escalating sophistication of cyberattacks and the stringent regulatory compliance requirements demanding tamper-evident solutions. The market is segmented by deployment type (cloud-based and on-premise) and user type (SMEs and large enterprises), with cloud-based solutions witnessing faster adoption due to their scalability and cost-effectiveness. Geographic expansion is also a significant factor, with North America and Europe currently holding substantial market share, though the Asia-Pacific region is poised for significant growth due to increasing digitalization and rising cybersecurity awareness. However, factors such as the high initial investment costs associated with implementing tamper-proof solutions and the complexity of integrating them into existing systems could pose challenges to market expansion. Despite the challenges, the long-term outlook remains positive, with a projected sustained growth trajectory through 2033. This growth will be fueled by advancements in technology, such as AI-powered security solutions and blockchain technology integration, further enhancing the reliability and effectiveness of webpage tamper-proof measures. The competitive landscape is characterized by a mix of established cybersecurity giants and innovative startups, leading to increased innovation and competitive pricing. This competitive environment drives continuous improvement in the quality, affordability, and accessibility of webpage tamper-proof solutions. The market's evolution will likely see a greater emphasis on proactive security measures, predictive analytics, and improved user experience to seamlessly integrate security without compromising website functionality.
P
Noise of Web Dataset
paperswithcode.com
Updated Aug 1, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Noise of Web Dataset [Dataset]. https://paperswithcode.com/dataset/noise-of-web-now
Explore at:
Dataset updated
Aug 1, 2024
Description
Noise of Web (NoW) is a challenging noisy correspondence learning (NCL) benchmark for robust image-text matching/retrieval models. It contains 100K image-text pairs consisting of website pages and multilingual website meta-descriptions (98,000 pairs for training, 1,000 for validation, and 1,000 for testing). NoW has two main characteristics: without human annotations and the noisy pairs are naturally captured. The source image data of NoW is obtained by taking screenshots when accessing web pages on mobile user interface (MUI) with 720 $\times$ 1280 resolution, and we parse the meta-description field in the HTML source code as the captions. In NCR (predecessor of NCL), each image in all datasets were preprocessed using Faster-RCNN detector provided by Bottom-up Attention Model to generate 36 region proposals, and each proposal was encoded as a 2048-dimensional feature. Thus, following NCR, we release our the features instead of raw images for fair comparison. However, we can not just use detection methods like Faster-RCNN to extract image features since it is trained on real-world animals and objects on MS-COCO. To tackle this, we adapt APT as the detection model since it is trained on MUI data. Then, we capture the 768-dimensional features of top 36 objects for one image. Due to the automated and non-human curated data collection process, the noise in NoW is highly authentic and intrinsic. The estimated noise ratio of this dataset is nearly 70%.
BIA BOGS Open Data Site
catalog.data.gov
Updated Jan 20, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bureau of Indian Affairs (2024). BIA BOGS Open Data Site [Dataset]. https://catalog.data.gov/dataset/opendata-1-bia-geospatial-hub-arcgis-com
Explore at:
Dataset updated
Jan 20, 2024
Dataset provided by
Bureau of Indian Affairshttp://www.bia.gov/
Description
This site provides National level geospatial data within the open public domain that can be useful to support tribal community resiliency, research, and more. The data is available for download as CSV, KML, Shapefile, and accessible via web services to support application development and data visualization. This site contains data created and maintained by the Branch of Geospatial Support.
w
Dataset of books called Creating your first Web page
workwithdata.com
Updated Apr 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2025). Dataset of books called Creating your first Web page [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=Creating+your+first+Web+page
Explore at:
Dataset updated
Apr 17, 2025
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about books. It has 1 row and is filtered where the book is Creating your first Web page. It features 7 columns including author, publication date, language, and book publisher.
d
Global Web Data | Web Scraping Data | Job Postings Data | Source: Company...
datarade.ai
.json
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
PredictLeads, Global Web Data | Web Scraping Data | Job Postings Data | Source: Company Website | 206M+ Records [Dataset]. https://datarade.ai/data-products/predictleads-web-data-web-scraping-data-job-postings-dat-predictleads
Explore at:
.jsonAvailable download formats
Dataset authored and provided by
PredictLeads
Area covered
Bosnia and Herzegovina, Kuwait, French Guiana, Virgin Islands (British), Northern Mariana Islands, Kosovo, El Salvador, Comoros, Guadeloupe, Bonaire
Description
PredictLeads Job Openings Data provides high-quality hiring insights sourced directly from company websites - not job boards. Using advanced web scraping technology, our dataset offers real-time access to job trends, salaries, and skills demand, making it a valuable resource for B2B sales, recruiting, investment analysis, and competitive intelligence.

Key Features:

✅206M+ Job Postings Tracked – Data sourced from 1.8M+ company websites worldwide. ✅7M+ Active Job Openings – Updated in real-time to reflect hiring demand. ✅Salary & Compensation Insights – Extract salary ranges, contract types, and job seniority levels. ✅Technology & Skill Tracking – Identify emerging tech trends and industry demands. ✅Company Data Enrichment – Link job postings to employer domains, firmographics, and growth signals. ✅Web Scraping Precision – Directly sourced from employer websites for unmatched accuracy.

Primary Attributes:

id (string, UUID) – Unique identifier for the job posting.

type (string, constant: "job_opening") – Object type.

title (string) – Job title.

description (string) – Full job description, extracted from the job listing.

url (string, URL) – Direct link to the job posting.

first_seen_at (string, ISO 8601 date-time) – Timestamp when the job was first detected.

last_seen_at (string, ISO 8601 date-time) – Timestamp when the job was last detected.

last_processed_at (string, ISO 8601 date-time) – Timestamp when the job data was last processed.

Job Metadata:

contract_types (array of strings) – Type of employment (e.g., "full time", "part time", "contract").

categories (array of strings) – Job categories (e.g., "engineering", "marketing").

seniority (string) – Seniority level of the job (e.g., "manager", "non_manager").

status (string) – Job status (e.g., "open", "closed").

language (string) – Language of the job posting.

location (string) – Full location details as listed in the job description.

Location Data (location_data) (array of objects)

city (string, nullable) – City where the job is located.

state (string, nullable) – State or region of the job location.

zip_code (string, nullable) – Postal/ZIP code.

country (string, nullable) – Country where the job is located.

region (string, nullable) – Broader geographical region.

continent (string, nullable) – Continent name.

fuzzy_match (boolean) – Indicates whether the location was inferred.

Salary Data (salary_data)

salary (string) – Salary range extracted from the job listing.

salary_low (float, nullable) – Minimum salary in original currency.

salary_high (float, nullable) – Maximum salary in original currency.

salary_currency (string, nullable) – Currency of the salary (e.g., "USD", "EUR").

salary_low_usd (float, nullable) – Converted minimum salary in USD.

salary_high_usd (float, nullable) – Converted maximum salary in USD.

salary_time_unit (string, nullable) – Time unit for the salary (e.g., "year", "month", "hour").

Occupational Data (onet_data) (object, nullable)

code (string, nullable) – ONET occupation code.

family (string, nullable) – Broad occupational family (e.g., "Computer and Mathematical").

occupation_name (string, nullable) – Official ONET occupation title.

Additional Attributes:

tags (array of strings, nullable) – Extracted skills and keywords (e.g., "Python", "JavaScript").

📌 Trusted by enterprises, recruiters, and investors for high-precision job market insights.

Response Example: https://docs.predictleads.com/v3/api_endpoints/job_openings_dataset/retrieve_company_s_job_openings
g
Web page with links to air quality and emissions data | gimi9.com
gimi9.com
Updated Nov 30, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2020). Web page with links to air quality and emissions data | gimi9.com [Dataset]. https://gimi9.com/dataset/data-gov_web-page-with-links-to-air-quality-and-emissions-data
Explore at:
Dataset updated
Nov 30, 2020
Description
🇺🇸 미국
d
Jornada Basin LTER: Wireless meteorological station at NPP T-EAST site:...
catalog.data.gov
search.dataone.org
+4more
Updated Apr 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). Jornada Basin LTER: Wireless meteorological station at NPP T-EAST site: 30-minute summary data: 2013 - ongoing [Dataset]. https://catalog.data.gov/dataset/jornada-basin-lter-wireless-meteorological-station-at-npp-t-east-site-30-minute-summary-da-4e1c2
Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
Agricultural Research Service
Description
30-minute summary data at NPP T-EAST met station. Average air temperature, relative humidity, wind speed, wind direction, and solar radiation are measured and calculated based on 1-second scan rate of all sensors located at an automated meteorological station installed at Jornada LTER NPP T-EAST site. Wind speed is measured at 75 cm, 150 cm, and 300 cm, wind direction at approximately 3m, and air temperature and relative humidity at approximate 2.5m. Solar radiation is measured at 3m. This climate station is operated by the Jornada LTER Program. This is an ONGOING dataset. Resources in this dataset:Resource Title: Website Pointer to html file. File Name: Web Page, url: https://portal.edirepository.org/nis/mapbrowse?scope=knb-lter-jrn&identifier=210437028 Webpage with information and links to data files for download
O
Open Data BR Site Analytics - Top 10 Assets by Referrer
data.brla.gov
application/rdfxml +5
Updated May 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Open Data BR Site Analytics - Top 10 Assets by Referrer [Dataset]. https://data.brla.gov/w/yc6c-hren/default?cur=Rxj0kpXNB2M
Explore at:
application/rdfxml, application/rssxml, tsv, xml, csv, jsonAvailable download formats
Dataset updated
May 28, 2025
Description
A referrer is the previous webpage a user was on when following a link to this domain. This dataset provides detail about which specific domains users were on and the assets users were sent to.
Referrer information is provided by date, referring domain and name of the asset the user was sent to. Please see Site Analytics: Referrers for more detail about these fields.
The dataset will reflect new Referrer records within a day of when they occur.
d
Jornada Basin LTER: Wireless meteorological station at NPP M-RABB site:...
catalog.data.gov
dataone.org
+3more
Updated Apr 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). Jornada Basin LTER: Wireless meteorological station at NPP M-RABB site: Daily summary data: 2013 - ongoing [Dataset]. https://catalog.data.gov/dataset/jornada-basin-lter-wireless-meteorological-station-at-npp-m-rabb-site-daily-summary-data-2-08123
Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
Agricultural Research Service
Description
Daily summary data at NPP M-RABB met station. Average/maximum/minimum air temperature; average/maximum relative humidity and wind speed and average wind direction; solar radiation; albedo. These are measured and calculated based on 1-second scan rate of all sensors located at an automated meteorological station installed at Jornada LTER NPP M-RABB site. Wind speed is measured at 75 cm, 150 cm, and 300 cm, wind direction at approximately 3m, and air temperature and relative humidity at approximate 2.5m. Solar radiation is measured at 3m. This climate station is operated by the Jornada LTER Program. This is an ONGOING dataset. Resources in this dataset:Resource Title: Website Pointer to html file. File Name: Web Page, url: https://portal.edirepository.org/nis/mapbrowse?scope=knb-lter-jrn&identifier=210437053 Webpage with information and links to data files for download
o
Webpage capture on the news article of authorities nix NGO census
data.opendevelopmentmekong.net
Updated May 3, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Webpage capture on the news article of authorities nix NGO census [Dataset]. https://data.opendevelopmentmekong.net/dataset/webpage-capture-on-the-news-article-of-authorities-nix-ngo-census
Explore at:
Dataset updated
May 3, 2024
Description
This webpage capture is the reference for labor incidents dataset. It contains news articles from local newspapers.
NVSBE Website - Why Attend
catalog.data.gov
data.va.gov
+2more
Updated May 1, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Veterans Affairs (2021). NVSBE Website - Why Attend [Dataset]. https://catalog.data.gov/dataset/nvsbe-website-why-attend
Explore at:
Dataset updated
May 1, 2021
Dataset provided by
United States Department of Veterans Affairshttp://va.gov/
Description
National Veterans Small Business Engagement website - why attend webpage
Webis-Web-Errors-19
zenodo.org
webis.de
+1more
csv, png, txt
Updated Jul 24, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Johannes Kiesel; Johannes Kiesel; Fabienne Hubricht; Benno Stein; Martin Potthast; Martin Potthast; Fabienne Hubricht; Benno Stein (2024). Webis-Web-Errors-19 [Dataset]. http://doi.org/10.5281/zenodo.2640364
Explore at:
csv, png, txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.2640364
Dataset updated
Jul 24, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Johannes Kiesel; Johannes Kiesel; Fabienne Hubricht; Benno Stein; Martin Potthast; Martin Potthast; Fabienne Hubricht; Benno Stein
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Webis-Web-Errors-19 comprises various annotations for the 10,000 web page archives of the Webis-Web-Archive-17. The annotations are whether the page is (1) mostly advertisement, (2) cut off, (3) still loading, (4) pornographic; and whether it shows (not/a bit/ very) (5) pop-ups, (6) CAPTCHAs, or (7) error messages. If you use this dataset in your research, please cite it using this paper.

Facebook

Twitter

Click to copy link

Link copied

Cite

PredictLeads (2024). Web Scraping Data | Key Customers Domain Name Data | Scanning Logos found on Websites | 222M+ Records [Dataset]. https://datarade.ai/data-products/predictleads-web-scraping-data-domain-name-data-business-predictleads

Web Scraping Data | Key Customers Domain Name Data | Scanning Logos found on Websites | 222M+ Records

Explore at:

.jsonAvailable download formats

Dataset updated

Jun 27, 2024

Dataset authored and provided by

PredictLeads

Area covered

Malaysia, Northern Mariana Islands, Turkmenistan, Colombia, Benin, Burkina Faso, Oman, Nigeria, Svalbard and Jan Mayen, Curaçao

Description

PredictLeads Key Customers Data provides essential business intelligence by analyzing company relationships, uncovering vendor partnerships, client connections, and strategic affiliations through advanced web scraping and logo recognition. This dataset captures business interactions directly from company websites, offering valuable insights into market positioning, competitive landscapes, and growth opportunities.

Use Cases:

✅ Account Profiling – Gain a 360-degree customer view by mapping company relationships and partnerships. ✅ Competitive Intelligence – Track vendor-client connections and business affiliations to identify key industry players. ✅ B2B Lead Targeting – Prioritize leads based on their business relationships, improving sales and marketing efficiency. ✅ CRM Data Enrichment – Enhance company records with detailed key customer data, ensuring data accuracy. ✅ Market Research – Identify emerging trends and industry networks to optimize strategic planning.

Key API Attributes:

id (string, UUID) – Unique identifier for the company connection.
category (string) – Type of relationship (e.g., vendor, client, partner).
source_category (string) – Where the connection was detected (e.g., partner page, case study).
source_url (string, URL) – Website where the relationship was found.
individual_source_url (string, URL) – Specific page confirming the connection.
context (string) – Extracted description of the business relationship (e.g., "Company X - partners with Company Y to enhance payment processing").
first_seen_at (ISO 8601 date-time) – Date the connection was first detected.
last_seen_at (ISO 8601 date-time) – Most recent confirmation of the relationship.
company1 & company2 (objects) – Details of the two connected companies, including:
- domain (string) – Company website domain.
- company_name (string) – Official company name.
- ticker (string, nullable) – Stock ticker, if available.

📌 PredictLeads Key Customers Data is an indispensable tool for B2B sales, marketing, and market intelligence teams, providing actionable relationship insights to drive targeted outreach, competitor tracking, and strategic decision-making.

API Example: https://docs.predictleads.com/v3/guide/connections_dataset/data_model

Clear search

Close search

Google apps

Main menu

Web Scraping Data | Key Customers Domain Name Data | Scanning Logos found on...

Altosight | AI Custom Web Scraping Data | 100% Global | Free Unlimited Data...

A web tracking data set of online browsing behavior of 2,148 users

Top syndicated pages from CDC.gov by weekly page views

Data from: Structural Profiling of Web Sites in the Wild

Web2Code

Data from: ArcGIS Enterprise

Iris Webpage

Webpage Tamper-Proof Report

Noise of Web Dataset

BIA BOGS Open Data Site

Dataset of books called Creating your first Web page

Global Web Data | Web Scraping Data | Job Postings Data | Source: Company...

Web page with links to air quality and emissions data | gimi9.com

Jornada Basin LTER: Wireless meteorological station at NPP T-EAST site:...

Open Data BR Site Analytics - Top 10 Assets by Referrer

Jornada Basin LTER: Wireless meteorological station at NPP M-RABB site:...

Webpage capture on the news article of authorities nix NGO census

NVSBE Website - Why Attend

Webis-Web-Errors-19

Web Scraping Data | Key Customers Domain Name Data | Scanning Logos found on Websites | 222M+ Records