This dataset provides comprehensive contact information extracted from websites in real-time. It includes emails, phone numbers, and social media profiles, and other contact methods found across website pages. The data is extracted through intelligent parsing of website content, meta information, and structured data. Users can leverage this dataset for lead generation, sales prospecting, business development, and contact database building. The API enables efficient extraction of contact details from any website, helping businesses streamline their outreach and contact discovery processes. The dataset is delivered in a JSON format via REST API.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By [source]
This dataset collects job offers from web scraping which are filtered according to specific keywords, locations and times. This data gives users rich and precise search capabilities to uncover the best working solution for them. With the information collected, users can explore options that match with their personal situation, skillset and preferences in terms of location and schedule. The columns provide detailed information around job titles, employer names, locations, time frames as well as other necessary parameters so you can make a smart choice for your next career opportunity
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset is a great resource for those looking to find an optimal work solution based on keywords, location and time parameters. With this information, users can quickly and easily search through job offers that best fit their needs. Here are some tips on how to use this dataset to its fullest potential:
Start by identifying what type of job offer you want to find. The keyword column will help you narrow down your search by allowing you to search for job postings that contain the word or phrase you are looking for.
Next, consider where the job is located – the Location column tells you where in the world each posting is from so make sure it’s somewhere that suits your needs!
Finally, consider when the position is available – look at the Time frame column which gives an indication of when each posting was made as well as if it’s a full-time/ part-time role or even if it’s a casual/temporary position from day one so make sure it meets your requirements first before applying!
Additionally, if details such as hours per week or further schedule information are important criteria then there is also info provided under Horari and Temps Oferta columns too! Now that all three criteria have been ticked off - key words, location and time frame - then take a look at Empresa (Company Name) and Nom_Oferta (Post Name) columns too in order to get an idea of who will be employing you should you land the gig!
All these pieces of data put together should give any motivated individual all they need in order to seek out an optimal work solution - keep hunting good luck!
- Machine learning can be used to groups job offers in order to facilitate the identification of similarities and differences between them. This could allow users to specifically target their search for a work solution.
- The data can be used to compare job offerings across different areas or types of jobs, enabling users to make better informed decisions in terms of their career options and goals.
- It may also provide an insight into the local job market, enabling companies and employers to identify where there is potential for new opportunities or possible trends that simply may have previously gone unnoticed
If you use this dataset in your research, please credit the original authors. Data Source
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: web_scraping_information_offers.csv | Column name | Description | |:-----------------|:------------------------------------| | Nom_Oferta | Name of the job offer. (String) | | Empresa | Company offering the job. (String) | | Ubicació | Location of the job offer. (String) | | Temps_Oferta | Time of the job offer. (String) | | Horari | Schedule of the job offer. (String) |
If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit .
🌐 Web Scraper: Turn Any URL into AI-Ready Data
Convert any public web page into clean, structured JSON in one click. Just paste a URL and this tool scrapes, cleans, and formats the content—ready to be used in any AI or content pipeline. Whether you're building datasets for LLMs or feeding fresh content into agents, this no-code tool makes it effortless to extract high-quality data from the web.
✨ Key Features
⚡ Scrape Any Public Page – Works on blogs, websites, docs… See the full description on the dataset page: https://huggingface.co/datasets/MasaFoundation/Bittensor_Whitepaper_Webscrape_Example.
We create tailor-made solutions for every customer, so there are no limits to how we can customize your scraper. You don't have to worry about buying and maintaining complex and expensive software, or hiring developers.
You can get the data on a one-time or recurring (based on your needs) basis.
Get the data in any format and to any destination you need: Excel, CSV, JSON, XML, S3, GCP, or any other.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
This Website Statistics dataset has four resources showing usage of the Lincolnshire Open Data website. Web analytics terms used in each resource are defined in their accompanying Metadata file.
Website Usage Statistics: This document shows a statistical summary of usage of the Lincolnshire Open Data site for the latest calendar year.
Website Statistics Summary: This dataset shows a website statistics summary for the Lincolnshire Open Data site for the latest calendar year.
Webpage Statistics: This dataset shows statistics for individual Webpages on the Lincolnshire Open Data site by calendar year.
Dataset Statistics: This dataset shows cumulative totals for Datasets on the Lincolnshire Open Data site that have also been published on the national Open Data site Data.Gov.UK - see the Source link.
Note: Website and Webpage statistics (the first three resources above) show only UK users, and exclude API calls (automated requests for datasets). The Dataset Statistics are confined to users with javascript enabled, which excludes web crawlers and API calls.
These Website Statistics resources are updated annually in January by the Lincolnshire County Council Business Intelligence team. For any enquiries about the information contact opendata@lincolnshire.gov.uk.
This statistic shows the percentage of individuals in Austria who used the internet to to create a website or blog from 2012 to 2016. In 2016, *** percent of all individuals used the internet in this way, but usage was higher among those who used the internet within the last three months, at ***** percent.
We create tailor-made solutions for every customer, so there are no limits to how we can customize your scraper. You don't have to worry about buying and maintaining complex and expensive software, or hiring developers.
You can get the data on a one-time or recurring (based on your needs) basis.
Get the data in any format and to any destination you need: Excel, CSV, JSON, XML, S3, GCP, or any other.
ComBase includes a systematically formatted database of quantified microbial responses to the food environment with more than 65,000 records, and is used for: Informing the design of food safety risk management plans Producing Food Safety Plans and HACCP plans Reducing food waste Assessing microbiological risk in foods The ComBase Browser enables you to search thousands of microbial growth and survival curves that have been collated in research establishments and from publications. The ComBase Predictive Models are a collection of software tools based on ComBase data to predict the growth or inactivation of microorganisms as a function of environmental factors such as temperature, pH and water activity in broth. Interested users can also contribute growth or inactivation data via the Donate Data page, which includes instructional videos, data template and sample, and an Excel demo file of data and macros for checking data format and syntax. Resources in this dataset:Resource Title: Website Pointer to ComBase. File Name: Web Page, url: https://www.combase.cc/index.php/en/ ComBase is an online tool for quantitative food microbiology. Its main features are the ComBase database and ComBase models, and can be accessed on any web platform, including mobile devices. The focus of ComBase is describing and predicting how microorganisms survive and grow under a variety of primarily food-related conditions. ComBase is a highly useful tool for food companies to understand safer ways of producing and storing foods. This includes developing new food products and reformulating foods, designing challenge test protocols, producing Food Safety plans, and helping public health organizations develop science-based food policies through quantitative risk assessment. Over 60,000 records have been deposited into ComBase, describing how food environments, such as temperature, pH, and water activity, as well as other factors (e.g. preservatives and atmosphere) affect the growth of bacteria. Each data record shows users how bacteria populations change for a particular combination of environmental factors. Mathematical models (the ComBase Predictor and Food models) were developed on systematically generated data to predict how various organisms grow or survive under various conditions.
https://scoop.market.us/privacy-policyhttps://scoop.market.us/privacy-policy
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a dataset of Tor cell file extracted from browsing simulation using Tor Browser. The simulations cover both desktop and mobile webpages. The data collection process was using WFP-Collector tool (https://github.com/irsyadpage/WFP-Collector). All the neccessary configuration to perform the simulation as detailed in the tool repository.The webpage URL is selected by using the first 100 website based on: https://dataforseo.com/free-seo-stats/top-1000-websites.Each webpage URL is visited 90 times for each deskop and mobile browsing mode.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Code:
Packet_Features_Generator.py & Features.py
To run this code:
pkt_features.py [-h] -i TXTFILE [-x X] [-y Y] [-z Z] [-ml] [-s S] -j
-h, --help show this help message and exit -i TXTFILE input text file -x X Add first X number of total packets as features. -y Y Add first Y number of negative packets as features. -z Z Add first Z number of positive packets as features. -ml Output to text file all websites in the format of websiteNumber1,feature1,feature2,... -s S Generate samples using size s. -j
Purpose:
Turns a text file containing lists of incomeing and outgoing network packet sizes into separate website objects with associative features.
Uses Features.py to calcualte the features.
startMachineLearning.sh & machineLearning.py
To run this code:
bash startMachineLearning.sh
This code then runs machineLearning.py in a tmux session with the nessisary file paths and flags
Options (to be edited within this file):
--evaluate-only to test 5 fold cross validation accuracy
--test-scaling-normalization to test 6 different combinations of scalers and normalizers
Note: once the best combination is determined, it should be added to the data_preprocessing function in machineLearning.py for future use
--grid-search to test the best grid search hyperparameters - note: the possible hyperparameters must be added to train_model under 'if not evaluateOnly:' - once best hyperparameters are determined, add them to train_model under 'if evaluateOnly:'
Purpose:
Using the .ml file generated by Packet_Features_Generator.py & Features.py, this program trains a RandomForest Classifier on the provided data and provides results using cross validation. These results include the best scaling and normailzation options for each data set as well as the best grid search hyperparameters based on the provided ranges.
Data
Encrypted network traffic was collected on an isolated computer visiting different Wikipedia and New York Times articles, different Google search queres (collected in the form of their autocomplete results and their results page), and different actions taken on a Virtual Reality head set.
Data for this experiment was stored and analyzed in the form of a txt file for each experiment which contains:
First number is a classification number to denote what website, query, or vr action is taking place.
The remaining numbers in each line denote:
The size of a packet,
and the direction it is traveling.
negative numbers denote incoming packets
positive numbers denote outgoing packets
Figure 4 Data
This data uses specific lines from the Virtual Reality.txt file.
The action 'LongText Search' refers to a user searching for "Saint Basils Cathedral" with text in the Wander app.
The action 'ShortText Search' refers to a user searching for "Mexico" with text in the Wander app.
The .xlsx and .csv file are identical
Each file includes (from right to left):
The origional packet data,
each line of data organized from smallest to largest packet size in order to calculate the mean and standard deviation of each packet capture,
and the final Cumulative Distrubution Function (CDF) caluclation that generated the Figure 4 Graph.
This statistic shows the percentage of individuals in Italy who used the internet to to create a website or blog from 2012 to 2016. In 2016, ***** percent of all individuals used the internet in this way, but usage was higher among those who used the internet within the last three months, at **** percent.
This statistic shows the percentage of individuals in Germany who used the internet to to create a website or blog from 2012 to 2016. In 2016, **** percent of all individuals used the internet in this way, but usage was higher among those who used the internet within the last three months, at *** percent.
https://www.ibisworld.com/about/termsofuse/https://www.ibisworld.com/about/termsofuse/
The growth of the Internet since its inception has fueled strong demand and profitability for web design services, as both businesses and households increasingly conduct activities online. The pandemic accelerated this trend, forcing businesses to upgrade their digital presence amid lockdowns and remote work, which resulted in significant revenue gains for web designers in 2020. This trend continued in 2021 as the strong economic recovery boosted corporate profit and gave businesses greater funds to invest in the industry’s services. More recently, high inflation and rising interest rates have raised costs and curtailed demand, with some businesses opting for cheaper alternatives like templates rather than custom web design, contributing to a drop in revenue in 2022. Despite these challenges, rising stock prices linked to AI advancements pushed business income substantially upward, enabling further investment in web design through 2023 and 2024 and benefiting revenue. However, high inflation and rising interest rates have recently raised costs and curtailed demand, with some businesses opting for cheaper alternatives like templates rather than custom web design. In response to shifting client expectations, web designers now prioritize mobile-first design, rapid performance, personalization and interactive content. These adaptations, along with investments in new technologies, have allowed web designers—especially smaller ones—to differentiate themselves and sustain long-term growth. Overall, revenue for web design services companies has swelled at a CAGR of 2.3% over the past five years, reaching $47.4 billion in 2025. This includes a 1.5% rise in revenue in that year. Market saturation will limit revenue growth for website designers moving forward. With nearly all US adults now using the Internet, opportunities for finding new customers are dwindling as internet usage approaches universality. As a result, major providers may turn to mergers and acquisitions to maintain market share, while smaller companies will likely focus on niche markets or specific geographies to secure stable income. Additionally, tariffs imposed by the Trump administration could further restrain demand by increasing consumer prices, reducing disposable income and pushing the economy toward recession. In response, web designers may expand geographically to find new clients. Amid these headwinds, AI and automation technologies are transforming design workflows, increasing efficiency while fostering a greater need for skilled workers and enabling more tailored services. Companies are also adapting by prioritizing inclusivity and sustainability, attracting broader demographics and eco-conscious clients. Overall, revenue for web design services providers is forecast to inch upward at a CAGR of 1.1% over the next five years, reaching $49.9 billion in 2030.
https://www.marketresearchintellect.com/privacy-policyhttps://www.marketresearchintellect.com/privacy-policy
Dive into Market Research Intellect's No Code Website Builder Tools Market Report, valued at USD 5.2 billion in 2024, and forecast to reach USD 12.3 billion by 2033, growing at a CAGR of 10.5% from 2026 to 2033.
This statistic shows the percentage of individuals in Luxembourg who used the internet to to create a website or blog from 2012 to 2016. In 2016, eight percent of all individuals and those who used the internet in the last three months used the internet in this way.
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
The global no-code website builder tools market is projected to reach $43.7 billion by 2033, exhibiting a CAGR of 12.4% during the forecast period. The increasing adoption of digital platforms by businesses and the growing demand for user-friendly website development tools are driving the market growth. The availability of a wide range of templates and drag-and-drop functionality makes no-code website builders accessible to individuals and businesses with limited technical expertise. Key trends shaping the market include the integration of artificial intelligence (AI) and machine learning (ML) to enhance the user experience, the rise of headless CMS platforms that enable greater flexibility and scalability, and the growing popularity of cloud-based no-code website builders that offer convenience and cost-effectiveness. The market is segmented into various types, applications, and regions, with North America holding a significant share due to the presence of leading technology companies and a large number of small and medium-sized businesses. Major players in the market include Wix, Bubble, Webflow, Squarespace, and WordPress, among others, who are focusing on expanding their offerings, forming strategic partnerships, and investing in research and development to gain a competitive edge. The no-code website builder tools market is experiencing exponential growth, with its value projected to reach over $17.6 billion by 2026. These tools empower non-technical individuals and businesses to create professional-looking websites without the need for programming knowledge.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Website is a dataset for object detection tasks - it contains All Website annotations for 322 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Google Chrome is a popular web browser developed by Google.
The Chrome User Experience Report is a public dataset of key user experience metrics for popular origins on the web, as experienced by Chrome users under real-world conditions.
https://bigquery.cloud.google.com/dataset/chrome-ux-report:all
For more info, see the documentation at https://developers.google.com/web/tools/chrome-user-experience-report/
License: CC BY 4.0
Photo by Edho Pratama on Unsplash
Information was reported as correct by central government departments at 29 February 2012.
In its Structural Reform plan, the Cabinet Office committed to begin quarterly publication of the number of open websites starting in financial year 2011.
The definition used of a website is a user-centric one. Something is counted as a separate website if it is active and either has a separate domain name or, when as a subdomain, the user cannot move freely between the subsite and parent site and there is no family likeness in the design. In other words, if the user experiences it as a separate site in their normal uses of browsing, search and interaction, it is counted as one.
A website is considered closed when it ceases to be actively funded, run and managed by central government, either by packaging information and putting it in the right place for the intended audience on another website or digital channel, or by a third party taking and managing it and bearing the cost. Where appropriate, domains stay operational in order to redirect users to the http://www.nationalarchives.gov.uk/webarchive/" class="govuk-link">UK Government Website Archive.
The GOV.UK exemption process began with a web rationalisation of the government’s Internet estate to reduce the number of obsolete websites and to establish the scale of the websites that the government owns.
Not included in the number or list are websites of public corporations as listed on the Office for National Statistics website, partnerships more than half-funded by private sector, charities and national museums. Specialist closed audience functions, such as the BIS Research Councils, BIS Sector Skills Councils and Industrial Training Boards, and the Defra Levy Boards and their websites, are not included in this data. The Ministry of Defence conducted their own rationalisation of MOD and the armed forces sites as an integral part of the Website Review; military sites belonging to a particular service are excluded from this dataset. Finally, those public bodies set up by Parliament and reporting directly to the Speaker’s Committee and only reporting through a ministerial government department for the purposes of enaction of legislation are also excluded (for example, the Electoral Commission and IPSA).
Websites are listed under the department name for which the minister in HMG has responsibility, either directly through their departmental activities, or indirectly through being the minister reporting to Parliament for independent bodies set up by statute.
For re-usability, these are provided as Excel and CSV files.
This dataset provides comprehensive contact information extracted from websites in real-time. It includes emails, phone numbers, and social media profiles, and other contact methods found across website pages. The data is extracted through intelligent parsing of website content, meta information, and structured data. Users can leverage this dataset for lead generation, sales prospecting, business development, and contact database building. The API enables efficient extraction of contact details from any website, helping businesses streamline their outreach and contact discovery processes. The dataset is delivered in a JSON format via REST API.