76 datasets found
  1. d

    Web Scraping Data | Key Customers Domain Name Data | Scanning Logos found on...

    • datarade.ai
    .json
    Updated Jun 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PredictLeads (2024). Web Scraping Data | Key Customers Domain Name Data | Scanning Logos found on Websites | 222M+ Records [Dataset]. https://datarade.ai/data-products/predictleads-web-scraping-data-domain-name-data-business-predictleads
    Explore at:
    .jsonAvailable download formats
    Dataset updated
    Jun 27, 2024
    Dataset authored and provided by
    PredictLeads
    Area covered
    Malaysia, Northern Mariana Islands, Turkmenistan, Colombia, Benin, Burkina Faso, Oman, Nigeria, Svalbard and Jan Mayen, Curaçao
    Description

    PredictLeads Key Customers Data provides essential business intelligence by analyzing company relationships, uncovering vendor partnerships, client connections, and strategic affiliations through advanced web scraping and logo recognition. This dataset captures business interactions directly from company websites, offering valuable insights into market positioning, competitive landscapes, and growth opportunities.

    Use Cases:

    ✅ Account Profiling – Gain a 360-degree customer view by mapping company relationships and partnerships. ✅ Competitive Intelligence – Track vendor-client connections and business affiliations to identify key industry players. ✅ B2B Lead Targeting – Prioritize leads based on their business relationships, improving sales and marketing efficiency. ✅ CRM Data Enrichment – Enhance company records with detailed key customer data, ensuring data accuracy. ✅ Market Research – Identify emerging trends and industry networks to optimize strategic planning.

    Key API Attributes:

    • id (string, UUID) – Unique identifier for the company connection.
    • category (string) – Type of relationship (e.g., vendor, client, partner).
    • source_category (string) – Where the connection was detected (e.g., partner page, case study).
    • source_url (string, URL) – Website where the relationship was found.
    • individual_source_url (string, URL) – Specific page confirming the connection.
    • context (string) – Extracted description of the business relationship (e.g., "Company X - partners with Company Y to enhance payment processing").
    • first_seen_at (ISO 8601 date-time) – Date the connection was first detected.
    • last_seen_at (ISO 8601 date-time) – Most recent confirmation of the relationship.
    • company1 & company2 (objects) – Details of the two connected companies, including:
    • - domain (string) – Company website domain.
    • - company_name (string) – Official company name.
    • - ticker (string, nullable) – Stock ticker, if available.

    📌 PredictLeads Key Customers Data is an indispensable tool for B2B sales, marketing, and market intelligence teams, providing actionable relationship insights to drive targeted outreach, competitor tracking, and strategic decision-making.

    API Example: https://docs.predictleads.com/v3/guide/connections_dataset/data_model

  2. d

    Altosight | AI Custom Web Scraping Data | 100% Global | Free Unlimited Data...

    • datarade.ai
    .json, .csv, .xls
    Updated Sep 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Altosight (2024). Altosight | AI Custom Web Scraping Data | 100% Global | Free Unlimited Data Points | Bypassing All CAPTCHAs & Blocking Mechanisms | GDPR Compliant [Dataset]. https://datarade.ai/data-products/altosight-ai-custom-web-scraping-data-100-global-free-altosight
    Explore at:
    .json, .csv, .xlsAvailable download formats
    Dataset updated
    Sep 7, 2024
    Dataset authored and provided by
    Altosight
    Area covered
    Wallis and Futuna, Paraguay, Tajikistan, Chile, Svalbard and Jan Mayen, Guatemala, Singapore, Côte d'Ivoire, Greenland, Czech Republic
    Description

    Altosight | AI Custom Web Scraping Data

    ✦ Altosight provides global web scraping data services with AI-powered technology that bypasses CAPTCHAs, blocking mechanisms, and handles dynamic content.

    We extract data from marketplaces like Amazon, aggregators, e-commerce, and real estate websites, ensuring comprehensive and accurate results.

    ✦ Our solution offers free unlimited data points across any project, with no additional setup costs.

    We deliver data through flexible methods such as API, CSV, JSON, and FTP, all at no extra charge.

    ― Key Use Cases ―

    ➤ Price Monitoring & Repricing Solutions

    🔹 Automatic repricing, AI-driven repricing, and custom repricing rules 🔹 Receive price suggestions via API or CSV to stay competitive 🔹 Track competitors in real-time or at scheduled intervals

    ➤ E-commerce Optimization

    🔹 Extract product prices, reviews, ratings, images, and trends 🔹 Identify trending products and enhance your e-commerce strategy 🔹 Build dropshipping tools or marketplace optimization platforms with our data

    ➤ Product Assortment Analysis

    🔹 Extract the entire product catalog from competitor websites 🔹 Analyze product assortment to refine your own offerings and identify gaps 🔹 Understand competitor strategies and optimize your product lineup

    ➤ Marketplaces & Aggregators

    🔹 Crawl entire product categories and track best-sellers 🔹 Monitor position changes across categories 🔹 Identify which eRetailers sell specific brands and which SKUs for better market analysis

    ➤ Business Website Data

    🔹 Extract detailed company profiles, including financial statements, key personnel, industry reports, and market trends, enabling in-depth competitor and market analysis

    🔹 Collect customer reviews and ratings from business websites to analyze brand sentiment and product performance, helping businesses refine their strategies

    ➤ Domain Name Data

    🔹 Access comprehensive data, including domain registration details, ownership information, expiration dates, and contact information. Ideal for market research, brand monitoring, lead generation, and cybersecurity efforts

    ➤ Real Estate Data

    🔹 Access property listings, prices, and availability 🔹 Analyze trends and opportunities for investment or sales strategies

    ― Data Collection & Quality ―

    ► Publicly Sourced Data: Altosight collects web scraping data from publicly available websites, online platforms, and industry-specific aggregators

    ► AI-Powered Scraping: Our technology handles dynamic content, JavaScript-heavy sites, and pagination, ensuring complete data extraction

    ► High Data Quality: We clean and structure unstructured data, ensuring it is reliable, accurate, and delivered in formats such as API, CSV, JSON, and more

    ► Industry Coverage: We serve industries including e-commerce, real estate, travel, finance, and more. Our solution supports use cases like market research, competitive analysis, and business intelligence

    ► Bulk Data Extraction: We support large-scale data extraction from multiple websites, allowing you to gather millions of data points across industries in a single project

    ► Scalable Infrastructure: Our platform is built to scale with your needs, allowing seamless extraction for projects of any size, from small pilot projects to ongoing, large-scale data extraction

    ― Why Choose Altosight? ―

    ✔ Unlimited Data Points: Altosight offers unlimited free attributes, meaning you can extract as many data points from a page as you need without extra charges

    ✔ Proprietary Anti-Blocking Technology: Altosight utilizes proprietary techniques to bypass blocking mechanisms, including CAPTCHAs, Cloudflare, and other obstacles. This ensures uninterrupted access to data, no matter how complex the target websites are

    ✔ Flexible Across Industries: Our crawlers easily adapt across industries, including e-commerce, real estate, finance, and more. We offer customized data solutions tailored to specific needs

    ✔ GDPR & CCPA Compliance: Your data is handled securely and ethically, ensuring compliance with GDPR, CCPA and other regulations

    ✔ No Setup or Infrastructure Costs: Start scraping without worrying about additional costs. We provide a hassle-free experience with fast project deployment

    ✔ Free Data Delivery Methods: Receive your data via API, CSV, JSON, or FTP at no extra charge. We ensure seamless integration with your systems

    ✔ Fast Support: Our team is always available via phone and email, resolving over 90% of support tickets within the same day

    ― Custom Projects & Real-Time Data ―

    ✦ Tailored Solutions: Every business has unique needs, which is why Altosight offers custom data projects. Contact us for a feasibility analysis, and we’ll design a solution that fits your goals

    ✦ Real-Time Data: Whether you need real-time data delivery or scheduled updates, we provide the flexibility to receive data when you need it. Track price changes, monitor product trends, or gather...

  3. A web tracking data set of online browsing behavior of 2,148 users

    • zenodo.org
    • explore.openaire.eu
    • +1more
    application/gzip, txt +1
    Updated May 14, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Juhi Kulshrestha; Juhi Kulshrestha; Marcos Oliveira; Marcos Oliveira; Orkut Karacalik; Denis Bonnay; Claudia Wagner; Orkut Karacalik; Denis Bonnay; Claudia Wagner (2021). A web tracking data set of online browsing behavior of 2,148 users [Dataset]. http://doi.org/10.5281/zenodo.4757574
    Explore at:
    zip, txt, application/gzipAvailable download formats
    Dataset updated
    May 14, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Juhi Kulshrestha; Juhi Kulshrestha; Marcos Oliveira; Marcos Oliveira; Orkut Karacalik; Denis Bonnay; Claudia Wagner; Orkut Karacalik; Denis Bonnay; Claudia Wagner
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    This anonymized data set consists of one month's (October 2018) web tracking data of 2,148 German users. For each user, the data contains the anonymized URL of the webpage the user visited, the domain of the webpage, category of the domain, which provides 41 distinct categories. In total, these 2,148 users made 9,151,243 URL visits, spanning 49,918 unique domains. For each user in our data set, we have self-reported information (collected via a survey) about their gender and age.

    We acknowledge the support of Respondi AG, which provided the web tracking and survey data free of charge for research purposes, with special thanks to François Erner and Luc Kalaora at Respondi for their insights and help with data extraction.

    The data set is analyzed in the following paper:

    • Kulshrestha, J., Oliveira, M., Karacalik, O., Bonnay, D., Wagner, C. "Web Routineness and Limits of Predictability: Investigating Demographic and Behavioral Differences Using Web Tracking Data." Proceedings of the International AAAI Conference on Web and Social Media. 2021. https://arxiv.org/abs/2012.15112.

    The code used to analyze the data is also available at https://github.com/gesiscss/web_tracking.

    If you use data or code from this repository, please cite the paper above and the Zenodo link.

  4. D

    Top syndicated pages from CDC.gov by weekly page views

    • data.cdc.gov
    • data.virginia.gov
    • +3more
    application/rdfxml +5
    Updated May 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office of the Associate Director for Communication, Division of News and Electronic Media (2025). Top syndicated pages from CDC.gov by weekly page views [Dataset]. https://data.cdc.gov/w/rppv-wbiv/tdwk-ruhb?cur=b3roiUNpULG
    Explore at:
    application/rdfxml, tsv, application/rssxml, csv, xml, jsonAvailable download formats
    Dataset updated
    May 27, 2025
    Dataset authored and provided by
    Office of the Associate Director for Communication, Division of News and Electronic Media
    Description

    The CDC Content Syndication site at https://tools.cdc.gov/syndication/ allows you to import content from CDC websites directly into your own website or application. These services are provided free of charge from CDC. The data shown in this table represent the weekly top page views from CDC.gov offered by syndication.

  5. Z

    Data from: Structural Profiling of Web Sites in the Wild

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jun 10, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hallé, Sylvain (2020). Structural Profiling of Web Sites in the Wild [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3718597
    Explore at:
    Dataset updated
    Jun 10, 2020
    Dataset provided by
    Chamberland-Thibeault, Xavier
    Hallé, Sylvain
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset contains and processes results of a large-scale survey of 708 websites, made in December 2019, in order to measure various features related to their size and structure: DOM tree size, maximum degree, depth, diversity of element types and CSS classes, among others. The goal of this research is to serve as a reference point for studies that include an empirical evaluation on samples of web pages.

    See the Readme.md file inside the archive for more details about its contents.

  6. h

    Web2Code

    • huggingface.co
    Updated Oct 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohamed Bin Zayed University of Artificial Intelligence (2024). Web2Code [Dataset]. https://huggingface.co/datasets/MBZUAI/Web2Code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 22, 2024
    Dataset authored and provided by
    Mohamed Bin Zayed University of Artificial Intelligence
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Details

    Our Web2Code instruction tuning dataset construction and instruction generation process involves four key components: (1) Creation of new webpage image-code pair data: We generated high-quality HTML webpage-code pairs following the CodeAlpaca prompt using GPT-3.5 and convert them into instruction-following data. (2) Refinement of existing webpage code generation data: We transform existing datasets including into an instruction-following data format similar to LLaVA… See the full description on the dataset page: https://huggingface.co/datasets/MBZUAI/Web2Code.

  7. Data from: ArcGIS Enterprise

    • margig-edt.hub.arcgis.com
    Updated May 2, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Esri European National Government Team (2019). ArcGIS Enterprise [Dataset]. https://margig-edt.hub.arcgis.com/datasets/arcgis-enterprise
    Explore at:
    Dataset updated
    May 2, 2019
    Dataset provided by
    Esrihttp://esri.com/
    Authors
    Esri European National Government Team
    Description

    ArcGIS Enterprise puts collaboration and flexibility at the center of your organization's GIS. It pairs industry-leading mapping and analytics capabilities with a dedicated Web GIS infrastructure to organize and share your work on any device, anywhere, at any time.

  8. f

    Iris Webpage

    • figshare.com
    html
    Updated Mar 9, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jesus Rogel-Salazar (2020). Iris Webpage [Dataset]. http://doi.org/10.6084/m9.figshare.7053392.v4
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Mar 9, 2020
    Dataset provided by
    figshare
    Authors
    Jesus Rogel-Salazar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A simple web page containing Fisher's Iris Dataset.

  9. W

    Webpage Tamper-Proof Report

    • marketresearchforecast.com
    doc, pdf, ppt
    Updated Mar 8, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market Research Forecast (2025). Webpage Tamper-Proof Report [Dataset]. https://www.marketresearchforecast.com/reports/webpage-tamper-proof-30228
    Explore at:
    pdf, doc, pptAvailable download formats
    Dataset updated
    Mar 8, 2025
    Dataset authored and provided by
    Market Research Forecast
    License

    https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The global webpage tamper-proof market is experiencing robust growth, driven by increasing concerns over data integrity and security breaches across various industries. The rising adoption of cloud-based solutions and the expanding digital footprint of SMEs and large enterprises are key catalysts. While the precise market size in 2025 is not provided, considering a plausible CAGR of 15% (a reasonable estimate based on cybersecurity market growth trends) and assuming a 2024 market size of $500 million (a conservative estimate given the presence of numerous established and emerging players), the 2025 market size could be estimated at approximately $575 million. This growth is further fueled by the escalating sophistication of cyberattacks and the stringent regulatory compliance requirements demanding tamper-evident solutions. The market is segmented by deployment type (cloud-based and on-premise) and user type (SMEs and large enterprises), with cloud-based solutions witnessing faster adoption due to their scalability and cost-effectiveness. Geographic expansion is also a significant factor, with North America and Europe currently holding substantial market share, though the Asia-Pacific region is poised for significant growth due to increasing digitalization and rising cybersecurity awareness. However, factors such as the high initial investment costs associated with implementing tamper-proof solutions and the complexity of integrating them into existing systems could pose challenges to market expansion. Despite the challenges, the long-term outlook remains positive, with a projected sustained growth trajectory through 2033. This growth will be fueled by advancements in technology, such as AI-powered security solutions and blockchain technology integration, further enhancing the reliability and effectiveness of webpage tamper-proof measures. The competitive landscape is characterized by a mix of established cybersecurity giants and innovative startups, leading to increased innovation and competitive pricing. This competitive environment drives continuous improvement in the quality, affordability, and accessibility of webpage tamper-proof solutions. The market's evolution will likely see a greater emphasis on proactive security measures, predictive analytics, and improved user experience to seamlessly integrate security without compromising website functionality.

  10. P

    Noise of Web Dataset

    • paperswithcode.com
    Updated Aug 1, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Noise of Web Dataset [Dataset]. https://paperswithcode.com/dataset/noise-of-web-now
    Explore at:
    Dataset updated
    Aug 1, 2024
    Description

    Noise of Web (NoW) is a challenging noisy correspondence learning (NCL) benchmark for robust image-text matching/retrieval models. It contains 100K image-text pairs consisting of website pages and multilingual website meta-descriptions (98,000 pairs for training, 1,000 for validation, and 1,000 for testing). NoW has two main characteristics: without human annotations and the noisy pairs are naturally captured. The source image data of NoW is obtained by taking screenshots when accessing web pages on mobile user interface (MUI) with 720 $\times$ 1280 resolution, and we parse the meta-description field in the HTML source code as the captions. In NCR (predecessor of NCL), each image in all datasets were preprocessed using Faster-RCNN detector provided by Bottom-up Attention Model to generate 36 region proposals, and each proposal was encoded as a 2048-dimensional feature. Thus, following NCR, we release our the features instead of raw images for fair comparison. However, we can not just use detection methods like Faster-RCNN to extract image features since it is trained on real-world animals and objects on MS-COCO. To tackle this, we adapt APT as the detection model since it is trained on MUI data. Then, we capture the 768-dimensional features of top 36 objects for one image. Due to the automated and non-human curated data collection process, the noise in NoW is highly authentic and intrinsic. The estimated noise ratio of this dataset is nearly 70%.

  11. BIA BOGS Open Data Site

    • catalog.data.gov
    Updated Jan 20, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bureau of Indian Affairs (2024). BIA BOGS Open Data Site [Dataset]. https://catalog.data.gov/dataset/opendata-1-bia-geospatial-hub-arcgis-com
    Explore at:
    Dataset updated
    Jan 20, 2024
    Dataset provided by
    Bureau of Indian Affairshttp://www.bia.gov/
    Description

    This site provides National level geospatial data within the open public domain that can be useful to support tribal community resiliency, research, and more. The data is available for download as CSV, KML, Shapefile, and accessible via web services to support application development and data visualization. This site contains data created and maintained by the Branch of Geospatial Support.

  12. w

    Dataset of books called Creating your first Web page

    • workwithdata.com
    Updated Apr 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2025). Dataset of books called Creating your first Web page [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=Creating+your+first+Web+page
    Explore at:
    Dataset updated
    Apr 17, 2025
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about books. It has 1 row and is filtered where the book is Creating your first Web page. It features 7 columns including author, publication date, language, and book publisher.

  13. d

    Global Web Data | Web Scraping Data | Job Postings Data | Source: Company...

    • datarade.ai
    .json
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PredictLeads, Global Web Data | Web Scraping Data | Job Postings Data | Source: Company Website | 206M+ Records [Dataset]. https://datarade.ai/data-products/predictleads-web-data-web-scraping-data-job-postings-dat-predictleads
    Explore at:
    .jsonAvailable download formats
    Dataset authored and provided by
    PredictLeads
    Area covered
    Bosnia and Herzegovina, Kuwait, French Guiana, Virgin Islands (British), Northern Mariana Islands, Kosovo, El Salvador, Comoros, Guadeloupe, Bonaire
    Description

    PredictLeads Job Openings Data provides high-quality hiring insights sourced directly from company websites - not job boards. Using advanced web scraping technology, our dataset offers real-time access to job trends, salaries, and skills demand, making it a valuable resource for B2B sales, recruiting, investment analysis, and competitive intelligence.

    Key Features:

    ✅206M+ Job Postings Tracked – Data sourced from 1.8M+ company websites worldwide. ✅7M+ Active Job Openings – Updated in real-time to reflect hiring demand. ✅Salary & Compensation Insights – Extract salary ranges, contract types, and job seniority levels. ✅Technology & Skill Tracking – Identify emerging tech trends and industry demands. ✅Company Data Enrichment – Link job postings to employer domains, firmographics, and growth signals. ✅Web Scraping Precision – Directly sourced from employer websites for unmatched accuracy.

    Primary Attributes:

    • id (string, UUID) – Unique identifier for the job posting.
    • type (string, constant: "job_opening") – Object type.
    • title (string) – Job title.
    • description (string) – Full job description, extracted from the job listing.
    • url (string, URL) – Direct link to the job posting.
    • first_seen_at (string, ISO 8601 date-time) – Timestamp when the job was first detected.
    • last_seen_at (string, ISO 8601 date-time) – Timestamp when the job was last detected.
    • last_processed_at (string, ISO 8601 date-time) – Timestamp when the job data was last processed.

    Job Metadata:

    • contract_types (array of strings) – Type of employment (e.g., "full time", "part time", "contract").
    • categories (array of strings) – Job categories (e.g., "engineering", "marketing").
    • seniority (string) – Seniority level of the job (e.g., "manager", "non_manager").
    • status (string) – Job status (e.g., "open", "closed").
    • language (string) – Language of the job posting.
    • location (string) – Full location details as listed in the job description.
    • Location Data (location_data) (array of objects)
    • city (string, nullable) – City where the job is located.
    • state (string, nullable) – State or region of the job location.
    • zip_code (string, nullable) – Postal/ZIP code.
    • country (string, nullable) – Country where the job is located.
    • region (string, nullable) – Broader geographical region.
    • continent (string, nullable) – Continent name.
    • fuzzy_match (boolean) – Indicates whether the location was inferred.

    Salary Data (salary_data)

    • salary (string) – Salary range extracted from the job listing.
    • salary_low (float, nullable) – Minimum salary in original currency.
    • salary_high (float, nullable) – Maximum salary in original currency.
    • salary_currency (string, nullable) – Currency of the salary (e.g., "USD", "EUR").
    • salary_low_usd (float, nullable) – Converted minimum salary in USD.
    • salary_high_usd (float, nullable) – Converted maximum salary in USD.
    • salary_time_unit (string, nullable) – Time unit for the salary (e.g., "year", "month", "hour").

    Occupational Data (onet_data) (object, nullable)

    • code (string, nullable) – ONET occupation code.
    • family (string, nullable) – Broad occupational family (e.g., "Computer and Mathematical").
    • occupation_name (string, nullable) – Official ONET occupation title.

    Additional Attributes:

    • tags (array of strings, nullable) – Extracted skills and keywords (e.g., "Python", "JavaScript").

    📌 Trusted by enterprises, recruiters, and investors for high-precision job market insights.

    Response Example: https://docs.predictleads.com/v3/api_endpoints/job_openings_dataset/retrieve_company_s_job_openings

  14. g

    Web page with links to air quality and emissions data | gimi9.com

    • gimi9.com
    Updated Nov 30, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2020). Web page with links to air quality and emissions data | gimi9.com [Dataset]. https://gimi9.com/dataset/data-gov_web-page-with-links-to-air-quality-and-emissions-data
    Explore at:
    Dataset updated
    Nov 30, 2020
    Description

    🇺🇸 미국

  15. d

    Jornada Basin LTER: Wireless meteorological station at NPP T-EAST site:...

    • catalog.data.gov
    • search.dataone.org
    • +4more
    Updated Apr 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). Jornada Basin LTER: Wireless meteorological station at NPP T-EAST site: 30-minute summary data: 2013 - ongoing [Dataset]. https://catalog.data.gov/dataset/jornada-basin-lter-wireless-meteorological-station-at-npp-t-east-site-30-minute-summary-da-4e1c2
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Agricultural Research Service
    Description

    30-minute summary data at NPP T-EAST met station. Average air temperature, relative humidity, wind speed, wind direction, and solar radiation are measured and calculated based on 1-second scan rate of all sensors located at an automated meteorological station installed at Jornada LTER NPP T-EAST site. Wind speed is measured at 75 cm, 150 cm, and 300 cm, wind direction at approximately 3m, and air temperature and relative humidity at approximate 2.5m. Solar radiation is measured at 3m. This climate station is operated by the Jornada LTER Program. This is an ONGOING dataset. Resources in this dataset:Resource Title: Website Pointer to html file. File Name: Web Page, url: https://portal.edirepository.org/nis/mapbrowse?scope=knb-lter-jrn&identifier=210437028 Webpage with information and links to data files for download

  16. O

    Open Data BR Site Analytics - Top 10 Assets by Referrer

    • data.brla.gov
    application/rdfxml +5
    Updated May 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Open Data BR Site Analytics - Top 10 Assets by Referrer [Dataset]. https://data.brla.gov/w/yc6c-hren/default?cur=Rxj0kpXNB2M
    Explore at:
    application/rdfxml, application/rssxml, tsv, xml, csv, jsonAvailable download formats
    Dataset updated
    May 28, 2025
    Description

    A referrer is the previous webpage a user was on when following a link to this domain. This dataset provides detail about which specific domains users were on and the assets users were sent to.

    Referrer information is provided by date, referring domain and name of the asset the user was sent to. Please see Site Analytics: Referrers for more detail about these fields.

    The dataset will reflect new Referrer records within a day of when they occur.

  17. d

    Jornada Basin LTER: Wireless meteorological station at NPP M-RABB site:...

    • catalog.data.gov
    • dataone.org
    • +3more
    Updated Apr 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). Jornada Basin LTER: Wireless meteorological station at NPP M-RABB site: Daily summary data: 2013 - ongoing [Dataset]. https://catalog.data.gov/dataset/jornada-basin-lter-wireless-meteorological-station-at-npp-m-rabb-site-daily-summary-data-2-08123
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Agricultural Research Service
    Description

    Daily summary data at NPP M-RABB met station. Average/maximum/minimum air temperature; average/maximum relative humidity and wind speed and average wind direction; solar radiation; albedo. These are measured and calculated based on 1-second scan rate of all sensors located at an automated meteorological station installed at Jornada LTER NPP M-RABB site. Wind speed is measured at 75 cm, 150 cm, and 300 cm, wind direction at approximately 3m, and air temperature and relative humidity at approximate 2.5m. Solar radiation is measured at 3m. This climate station is operated by the Jornada LTER Program. This is an ONGOING dataset. Resources in this dataset:Resource Title: Website Pointer to html file. File Name: Web Page, url: https://portal.edirepository.org/nis/mapbrowse?scope=knb-lter-jrn&identifier=210437053 Webpage with information and links to data files for download

  18. o

    Webpage capture on the news article of authorities nix NGO census

    • data.opendevelopmentmekong.net
    Updated May 3, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Webpage capture on the news article of authorities nix NGO census [Dataset]. https://data.opendevelopmentmekong.net/dataset/webpage-capture-on-the-news-article-of-authorities-nix-ngo-census
    Explore at:
    Dataset updated
    May 3, 2024
    Description

    This webpage capture is the reference for labor incidents dataset. It contains news articles from local newspapers.

  19. NVSBE Website - Why Attend

    • catalog.data.gov
    • data.va.gov
    • +2more
    Updated May 1, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Veterans Affairs (2021). NVSBE Website - Why Attend [Dataset]. https://catalog.data.gov/dataset/nvsbe-website-why-attend
    Explore at:
    Dataset updated
    May 1, 2021
    Dataset provided by
    United States Department of Veterans Affairshttp://va.gov/
    Description

    National Veterans Small Business Engagement website - why attend webpage

  20. Webis-Web-Errors-19

    • zenodo.org
    • webis.de
    • +1more
    csv, png, txt
    Updated Jul 24, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Johannes Kiesel; Johannes Kiesel; Fabienne Hubricht; Benno Stein; Martin Potthast; Martin Potthast; Fabienne Hubricht; Benno Stein (2024). Webis-Web-Errors-19 [Dataset]. http://doi.org/10.5281/zenodo.2640364
    Explore at:
    csv, png, txtAvailable download formats
    Dataset updated
    Jul 24, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Johannes Kiesel; Johannes Kiesel; Fabienne Hubricht; Benno Stein; Martin Potthast; Martin Potthast; Fabienne Hubricht; Benno Stein
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Webis-Web-Errors-19 comprises various annotations for the 10,000 web page archives of the Webis-Web-Archive-17. The annotations are whether the page is (1) mostly advertisement, (2) cut off, (3) still loading, (4) pornographic; and whether it shows (not/a bit/ very) (5) pop-ups, (6) CAPTCHAs, or (7) error messages. If you use this dataset in your research, please cite it using this paper.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
PredictLeads (2024). Web Scraping Data | Key Customers Domain Name Data | Scanning Logos found on Websites | 222M+ Records [Dataset]. https://datarade.ai/data-products/predictleads-web-scraping-data-domain-name-data-business-predictleads

Web Scraping Data | Key Customers Domain Name Data | Scanning Logos found on Websites | 222M+ Records

Explore at:
.jsonAvailable download formats
Dataset updated
Jun 27, 2024
Dataset authored and provided by
PredictLeads
Area covered
Malaysia, Northern Mariana Islands, Turkmenistan, Colombia, Benin, Burkina Faso, Oman, Nigeria, Svalbard and Jan Mayen, Curaçao
Description

PredictLeads Key Customers Data provides essential business intelligence by analyzing company relationships, uncovering vendor partnerships, client connections, and strategic affiliations through advanced web scraping and logo recognition. This dataset captures business interactions directly from company websites, offering valuable insights into market positioning, competitive landscapes, and growth opportunities.

Use Cases:

✅ Account Profiling – Gain a 360-degree customer view by mapping company relationships and partnerships. ✅ Competitive Intelligence – Track vendor-client connections and business affiliations to identify key industry players. ✅ B2B Lead Targeting – Prioritize leads based on their business relationships, improving sales and marketing efficiency. ✅ CRM Data Enrichment – Enhance company records with detailed key customer data, ensuring data accuracy. ✅ Market Research – Identify emerging trends and industry networks to optimize strategic planning.

Key API Attributes:

  • id (string, UUID) – Unique identifier for the company connection.
  • category (string) – Type of relationship (e.g., vendor, client, partner).
  • source_category (string) – Where the connection was detected (e.g., partner page, case study).
  • source_url (string, URL) – Website where the relationship was found.
  • individual_source_url (string, URL) – Specific page confirming the connection.
  • context (string) – Extracted description of the business relationship (e.g., "Company X - partners with Company Y to enhance payment processing").
  • first_seen_at (ISO 8601 date-time) – Date the connection was first detected.
  • last_seen_at (ISO 8601 date-time) – Most recent confirmation of the relationship.
  • company1 & company2 (objects) – Details of the two connected companies, including:
  • - domain (string) – Company website domain.
  • - company_name (string) – Official company name.
  • - ticker (string, nullable) – Stock ticker, if available.

📌 PredictLeads Key Customers Data is an indispensable tool for B2B sales, marketing, and market intelligence teams, providing actionable relationship insights to drive targeted outreach, competitor tracking, and strategic decision-making.

API Example: https://docs.predictleads.com/v3/guide/connections_dataset/data_model

Search
Clear search
Close search
Google apps
Main menu