100+ datasets found
  1. d

    Global Web Data | Web Scraping Data | Job Postings Data | Source: Company...

    • datarade.ai
    .json
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PredictLeads, Global Web Data | Web Scraping Data | Job Postings Data | Source: Company Website | 232M+ Records [Dataset]. https://datarade.ai/data-products/predictleads-web-data-web-scraping-data-job-postings-dat-predictleads
    Explore at:
    .jsonAvailable download formats
    Dataset authored and provided by
    PredictLeads
    Area covered
    Bosnia and Herzegovina, Guadeloupe, Comoros, French Guiana, Virgin Islands (British), Northern Mariana Islands, Kuwait, El Salvador, Bonaire, Kosovo
    Description

    PredictLeads Job Openings Data provides high-quality hiring insights sourced directly from company websites - not job boards. Using advanced web scraping technology, our dataset offers real-time access to job trends, salaries, and skills demand, making it a valuable resource for B2B sales, recruiting, investment analysis, and competitive intelligence.

    Key Features:

    ✅232M+ Job Postings Tracked – Data sourced from 92 Million company websites worldwide. ✅7,1M+ Active Job Openings – Updated in real-time to reflect hiring demand. ✅Salary & Compensation Insights – Extract salary ranges, contract types, and job seniority levels. ✅Technology & Skill Tracking – Identify emerging tech trends and industry demands. ✅Company Data Enrichment – Link job postings to employer domains, firmographics, and growth signals. ✅Web Scraping Precision – Directly sourced from employer websites for unmatched accuracy.

    Primary Attributes:

    • id (string, UUID) – Unique identifier for the job posting.
    • type (string, constant: "job_opening") – Object type.
    • title (string) – Job title.
    • description (string) – Full job description, extracted from the job listing.
    • url (string, URL) – Direct link to the job posting.
    • first_seen_at – Timestamp when the job was first detected.
    • last_seen_at – Timestamp when the job was last detected.
    • last_processed_at – Timestamp when the job data was last processed.

    Job Metadata:

    • contract_types (array of strings) – Type of employment (e.g., "full time", "part time", "contract").
    • categories (array of strings) – Job categories (e.g., "engineering", "marketing").
    • seniority (string) – Seniority level of the job (e.g., "manager", "non_manager").
    • status (string) – Job status (e.g., "open", "closed").
    • language (string) – Language of the job posting.
    • location (string) – Full location details as listed in the job description.
    • Location Data (location_data) (array of objects)
    • city (string, nullable) – City where the job is located.
    • state (string, nullable) – State or region of the job location.
    • zip_code (string, nullable) – Postal/ZIP code.
    • country (string, nullable) – Country where the job is located.
    • region (string, nullable) – Broader geographical region.
    • continent (string, nullable) – Continent name.
    • fuzzy_match (boolean) – Indicates whether the location was inferred.

    Salary Data (salary_data)

    • salary (string) – Salary range extracted from the job listing.
    • salary_low (float, nullable) – Minimum salary in original currency.
    • salary_high (float, nullable) – Maximum salary in original currency.
    • salary_currency (string, nullable) – Currency of the salary (e.g., "USD", "EUR").
    • salary_low_usd (float, nullable) – Converted minimum salary in USD.
    • salary_high_usd (float, nullable) – Converted maximum salary in USD.
    • salary_time_unit (string, nullable) – Time unit for the salary (e.g., "year", "month", "hour").

    Occupational Data (onet_data) (object, nullable)

    • code (string, nullable) – ONET occupation code.
    • family (string, nullable) – Broad occupational family (e.g., "Computer and Mathematical").
    • occupation_name (string, nullable) – Official ONET occupation title.

    Additional Attributes:

    • tags (array of strings, nullable) – Extracted skills and keywords (e.g., "Python", "JavaScript").

    📌 Trusted by enterprises, recruiters, and investors for high-precision job market insights.

    PredictLeads Dataset: https://docs.predictleads.com/v3/guide/job_openings_dataset

  2. f

    Web Designer Express | Graphics Multimedia & Web Design | Technology Data

    • datastore.forage.ai
    Updated Sep 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Web Designer Express | Graphics Multimedia & Web Design | Technology Data [Dataset]. https://datastore.forage.ai/searchresults/?resource_keyword=web
    Explore at:
    Dataset updated
    Sep 22, 2024
    Description

    Web Designer Express is a reputable Miami-based company that has been in business for 20 years. With a team of experienced web designers and developers, they offer a wide range of services, including web design, e-commerce development, web development, and more. Their portfolio showcases over 10,000 websites designed, with a focus on creating custom, unique solutions for each client. With a presence in Miami, Florida, they cater to businesses and individuals seeking to establish a strong online presence. As a company, Web Designer Express is dedicated to building long-lasting relationships with their clients, providing personalized service, and exceeding expectations.

  3. d

    Website Analytics

    • catalog.data.gov
    • data.nola.gov
    • +4more
    Updated Jun 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.nola.gov (2025). Website Analytics [Dataset]. https://catalog.data.gov/dataset/website-analytics
    Explore at:
    Dataset updated
    Jun 28, 2025
    Dataset provided by
    data.nola.gov
    Description

    This data about nola.gov provides a window into how people are interacting with the the City of New Orleans online. The data comes from a unified Google Analytics account for New Orleans. We do not track individuals and we anonymize the IP addresses of all visitors.

  4. f

    Business Software Alliance | Web Hosting & Domain Names | Technology Data

    • datastore.forage.ai
    Updated Sep 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Business Software Alliance | Web Hosting & Domain Names | Technology Data [Dataset]. https://datastore.forage.ai/searchresults/?resource_keyword=web
    Explore at:
    Dataset updated
    Sep 22, 2024
    Description

    Business Software Alliance is a trade association that represents the world's leading software companies, including Autodesk, IBM, and Symantec. The organization's members are committed to promoting the use of legitimate software and ensuring the integrity of their intellectual property.

    As a result, the data housed on BSA's website is rich in information related to the software industry, including software licensing, anti-piracy efforts, and digital piracy statistics. The data includes information on software usage, software development, and the impact of piracy on the technology industry. With its focus on promoting legitimate software use, the data on BSA's website provides valuable insights into the global software industry.

  5. Website Statistics

    • data.wu.ac.at
    • lcc.portaljs.com
    • +2more
    csv, pdf
    Updated Jun 11, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lincolnshire County Council (2018). Website Statistics [Dataset]. https://data.wu.ac.at/schema/data_gov_uk/M2ZkZDBjOTUtMzNhYi00YWRjLWI1OWMtZmUzMzA5NjM0ZTdk
    Explore at:
    csv, pdfAvailable download formats
    Dataset updated
    Jun 11, 2018
    Dataset provided by
    Lincolnshire County Councilhttp://www.lincolnshire.gov.uk/
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    This Website Statistics dataset has four resources showing usage of the Lincolnshire Open Data website. Web analytics terms used in each resource are defined in their accompanying Metadata file.

    • Website Usage Statistics: This document shows a statistical summary of usage of the Lincolnshire Open Data site for the latest calendar year.

    • Website Statistics Summary: This dataset shows a website statistics summary for the Lincolnshire Open Data site for the latest calendar year.

    • Webpage Statistics: This dataset shows statistics for individual Webpages on the Lincolnshire Open Data site by calendar year.

    • Dataset Statistics: This dataset shows cumulative totals for Datasets on the Lincolnshire Open Data site that have also been published on the national Open Data site Data.Gov.UK - see the Source link.

      Note: Website and Webpage statistics (the first three resources above) show only UK users, and exclude API calls (automated requests for datasets). The Dataset Statistics are confined to users with javascript enabled, which excludes web crawlers and API calls.

    These Website Statistics resources are updated annually in January by the Lincolnshire County Council Business Intelligence team. For any enquiries about the information contact opendata@lincolnshire.gov.uk.

  6. d

    The Use of Web Services in Data Exchanges

    • catalog.data.gov
    • data.virginia.gov
    • +1more
    Updated Sep 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Administration for Children and Families (2025). The Use of Web Services in Data Exchanges [Dataset]. https://catalog.data.gov/dataset/the-use-of-web-services-in-data-exchanges
    Explore at:
    Dataset updated
    Sep 6, 2025
    Dataset provided by
    Administration for Children and Families
    Description

    This presentation provides an overview of web services, how they use different types of patterns, and their purpose in effectively exchanging data in real time between heterogeneous systems. The presentation introduces the concept of service-oriented architecture using web services and how these web services can be used to integrate diverse repositories of data as well as to create efficient solutions for sharing and accessing data across the enterprise. The speaker for this presentation is Valter Borges, director, Information Systems, State of Connecticut Department of Children and Families. Metadata-only record linking to the original dataset. Open original dataset below.

  7. NYC Open Data Plan: Website Data

    • data.cityofnewyork.us
    • catalog.data.gov
    csv, xlsx, xml
    Updated Sep 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office of Technology and Innovation (OTI) (2025). NYC Open Data Plan: Website Data [Dataset]. https://data.cityofnewyork.us/City-Government/NYC-Open-Data-Plan-Website-Data/duz4-2gn9
    Explore at:
    xlsx, csv, xmlAvailable download formats
    Dataset updated
    Sep 15, 2025
    Dataset provided by
    New York City Office of Technology and Innovationhttps://www.nyc.gov/content/oti/pages/
    Authors
    Office of Technology and Innovation (OTI)
    Description

    NOTE: To review the latest plan, make sure to filter the "Report Year" column to the latest year.

    Data on public websites maintained by or on behalf of the city agencies.

  8. Media, news and magazines data & analytics

    • datarade.ai
    .json, .csv, .xls
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Forloop.ai, Media, news and magazines data & analytics [Dataset]. https://datarade.ai/data-products/media-news-and-magazines-data-analytics-forloop-ai
    Explore at:
    .json, .csv, .xlsAvailable download formats
    Dataset provided by
    Loop Technologies AB
    Area covered
    Netherlands, Aruba, Canada, Botswana, Senegal, Suriname, Pitcairn, Albania, India, Guernsey
    Description

    With our advanced analytics tools, you can make data-driven decisions to optimize your media strategy, improve audience engagement, and maximize your advertising ROI. Our analytics tools help businesses gain insights into consumer preferences, identify market trends, and optimize their advertising efforts.

    Whether you are a media agency, a publisher, or a brand looking to enhance your media strategy, our media data and analytics solutions can help you gain a competitive edge in the market. With our products, you can access the information you need to make informed decisions about your media strategy and stay ahead of the curve.

    Sources: ČTK Reuters BBC CNN Skynews Mirror Sun Bild Dailymail https://www.psp.cz/sqw/hp.sqw?k=1300 hl.m.Praha

  9. G

    Internet Data Center Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Aug 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). Internet Data Center Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/internet-data-center-market-global-industry-analysis
    Explore at:
    csv, pdf, pptxAvailable download formats
    Dataset updated
    Aug 29, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Internet Data Center Market Outlook



    According to our latest research, the global Internet Data Center market size stood at USD 68.3 billion in 2024, registering a robust growth trajectory. The market is forecasted to reach USD 165.7 billion by 2033, expanding at a healthy CAGR of 10.4% during the 2025-2033 period. The key growth factor driving this surge is the exponential rise in data generation, cloud computing adoption, and the proliferation of digital transformation initiatives across industries worldwide. As organizations increasingly prioritize business continuity, security, and scalability, the demand for advanced data center infrastructure is at an all-time high, shaping the future of the Internet Data Center market.




    One of the primary drivers fueling the growth of the Internet Data Center market is the rapid expansion of digital services and applications, which has led to an unprecedented surge in global data traffic. The proliferation of Internet of Things (IoT) devices, video streaming, e-commerce, and social media platforms has necessitated the deployment of high-capacity, low-latency data centers capable of handling massive workloads. Enterprises and service providers are investing heavily in data center modernization, focusing on energy efficiency, automation, and robust connectivity to support these evolving digital ecosystems. The growing emphasis on hybrid and multi-cloud strategies further amplifies the need for flexible and scalable data center solutions, propelling market growth.




    Another significant growth factor is the increasing adoption of artificial intelligence (AI), machine learning, and big data analytics across various sectors, including healthcare, finance, and retail. These technologies require substantial computational power and storage capabilities, driving demand for advanced data center infrastructure. Modern data centers are being designed to support high-density computing, GPU acceleration, and edge computing, enabling real-time data processing and analytics at scale. Additionally, the shift toward software-defined data centers (SDDC) and virtualization is transforming traditional data center architectures, enabling greater agility, cost-efficiency, and operational resilience. This evolution is further supported by advancements in network technologies such as 5G, which facilitate faster data transmission and improved user experiences.




    Sustainability and energy efficiency have emerged as crucial considerations in the Internet Data Center market, as organizations and governments worldwide prioritize environmental responsibility. Data centers are significant consumers of electricity, prompting the adoption of green technologies, renewable energy sources, and innovative cooling solutions to minimize carbon footprints. Regulatory mandates and industry standards are driving investments in energy-efficient hardware, intelligent power management, and sustainable building practices. Leading market players are increasingly focusing on achieving carbon neutrality and leveraging circular economy principles, which not only reduce operational costs but also enhance brand reputation and stakeholder trust. This sustainable approach is expected to shape investment decisions and technological advancements in the coming years.



    As the demand for data processing and storage continues to grow, the concept of a Hyperscale Data Center has emerged as a pivotal solution to meet these needs. Hyperscale data centers are designed to efficiently scale up resources, accommodating the vast amounts of data generated by modern digital activities. These facilities are characterized by their ability to support thousands of servers and millions of virtual machines, ensuring seamless performance and reliability. The architecture of hyperscale data centers focuses on maximizing energy efficiency and optimizing cooling systems, making them a sustainable choice for large-scale operations. As businesses increasingly rely on cloud services and big data analytics, the role of hyperscale data centers becomes ever more critical in providing the necessary infrastructure to support these advanced technologies.




    Regionally, the Asia Pacific market is witnessing remarkable growth, outpacing other regions due to rapid digitalization, government initiatives, and increasing internet penetration. Countries such as China, India, and Singapo

  10. Z

    Network Traffic Analysis: Data and Code

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    Updated Jun 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Moran, Madeline; Honig, Joshua; Ferrell, Nathan; Soni, Shreena; Homan, Sophia; Chan-Tin, Eric (2024). Network Traffic Analysis: Data and Code [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11479410
    Explore at:
    Dataset updated
    Jun 12, 2024
    Dataset provided by
    Loyola University Chicago
    Authors
    Moran, Madeline; Honig, Joshua; Ferrell, Nathan; Soni, Shreena; Homan, Sophia; Chan-Tin, Eric
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Code:

    Packet_Features_Generator.py & Features.py

    To run this code:

    pkt_features.py [-h] -i TXTFILE [-x X] [-y Y] [-z Z] [-ml] [-s S] -j

    -h, --help show this help message and exit -i TXTFILE input text file -x X Add first X number of total packets as features. -y Y Add first Y number of negative packets as features. -z Z Add first Z number of positive packets as features. -ml Output to text file all websites in the format of websiteNumber1,feature1,feature2,... -s S Generate samples using size s. -j

    Purpose:

    Turns a text file containing lists of incomeing and outgoing network packet sizes into separate website objects with associative features.

    Uses Features.py to calcualte the features.

    startMachineLearning.sh & machineLearning.py

    To run this code:

    bash startMachineLearning.sh

    This code then runs machineLearning.py in a tmux session with the nessisary file paths and flags

    Options (to be edited within this file):

    --evaluate-only to test 5 fold cross validation accuracy

    --test-scaling-normalization to test 6 different combinations of scalers and normalizers

    Note: once the best combination is determined, it should be added to the data_preprocessing function in machineLearning.py for future use

    --grid-search to test the best grid search hyperparameters - note: the possible hyperparameters must be added to train_model under 'if not evaluateOnly:' - once best hyperparameters are determined, add them to train_model under 'if evaluateOnly:'

    Purpose:

    Using the .ml file generated by Packet_Features_Generator.py & Features.py, this program trains a RandomForest Classifier on the provided data and provides results using cross validation. These results include the best scaling and normailzation options for each data set as well as the best grid search hyperparameters based on the provided ranges.

    Data

    Encrypted network traffic was collected on an isolated computer visiting different Wikipedia and New York Times articles, different Google search queres (collected in the form of their autocomplete results and their results page), and different actions taken on a Virtual Reality head set.

    Data for this experiment was stored and analyzed in the form of a txt file for each experiment which contains:

    First number is a classification number to denote what website, query, or vr action is taking place.

    The remaining numbers in each line denote:

    The size of a packet,

    and the direction it is traveling.

    negative numbers denote incoming packets

    positive numbers denote outgoing packets

    Figure 4 Data

    This data uses specific lines from the Virtual Reality.txt file.

    The action 'LongText Search' refers to a user searching for "Saint Basils Cathedral" with text in the Wander app.

    The action 'ShortText Search' refers to a user searching for "Mexico" with text in the Wander app.

    The .xlsx and .csv file are identical

    Each file includes (from right to left):

    The origional packet data,

    each line of data organized from smallest to largest packet size in order to calculate the mean and standard deviation of each packet capture,

    and the final Cumulative Distrubution Function (CDF) caluclation that generated the Figure 4 Graph.

  11. d

    Web Data | Web Scraping Data | Technographic Data | Source: Job Openings,...

    • datarade.ai
    .json
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PredictLeads, Web Data | Web Scraping Data | Technographic Data | Source: Job Openings, HTML and JavaScripts | 1B+ Records [Dataset]. https://datarade.ai/data-products/predictleads-web-data-web-scraping-data-technographic-da-predictleads
    Explore at:
    .jsonAvailable download formats
    Dataset authored and provided by
    PredictLeads
    Area covered
    New Caledonia, Marshall Islands, Micronesia (Federated States of), Nepal, Kiribati, Cook Islands, Sri Lanka, Grenada, Japan, South Africa
    Description

    PredictLeads Technographic Data is a powerful tool for B2B organizations, providing detailed technographic and firmographic insights extracted through sophisticated web scraping techniques. Unlike traditional datasets, it identifies emerging technologies in job postings, revealing real-time technology adoption trends across industries. These insights fuel technical decision-making, B2B data cleansing, account profiling, and 360-degree customer analysis.

    Use Cases:

    ✅ Technical Account Profiling – Analyze a company’s technology stack and hiring trends for better-targeted sales and marketing. ✅ B2B Data Cleansing – Enhance CRM and data enrichment efforts with up-to-date, verified technographic insights. ✅ Technology Trend Analysis – Identify high-growth industries and emerging tech adoption patterns. ✅ Competitive Intelligence – Assess competitor tech stacks and innovation roadmaps based on hiring activity. ✅ 360-Degree Customer View – Integrate firmographic and technographic data for a complete B2B customer profile.

    Key API Attributes:

    • id (string, UUID) – Unique identifier for the technology detection.
    • first_seen_at (ISO 8601 date-time) – Date when the technology was first detected.
    • last_seen_at (ISO 8601 date-time) – Last observed instance of the technology in use.
    • technology (object) – Details about the detected technology:
    • name (string) – Technology name (e.g., "AWS Lambda", "Kubernetes").
    • company (object) – Data about the company using the technology:
    • domain (string) – Company website domain.
    • company_name (string) – Full company name.
    • seen_on_job_openings (array, nullable) – List of job postings mentioning the technology, indicating hiring demand.
    • seen_on_subpages (array) – URLs of web pages where the technology was detected, providing additional context.

    📌 PredictLeads Technographic Data is the go-to solution for B2B professionals looking to optimize technical sales strategies, refine account targeting, and gain a competitive edge in technology-driven markets.

    PredictLeads Docs: https://docs.predictleads.com/v3/guide/technology_detections_dataset

  12. A web tracking data set of online browsing behavior of 2,148 users

    • zenodo.org
    • data.niaid.nih.gov
    application/gzip, txt +1
    Updated Oct 9, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Juhi Kulshrestha; Juhi Kulshrestha; Marcos Oliveira; Marcos Oliveira; Orkut Karacalik; Denis Bonnay; Claudia Wagner; Orkut Karacalik; Denis Bonnay; Claudia Wagner (2025). A web tracking data set of online browsing behavior of 2,148 users [Dataset]. http://doi.org/10.5281/zenodo.4757574
    Explore at:
    zip, txt, application/gzipAvailable download formats
    Dataset updated
    Oct 9, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Juhi Kulshrestha; Juhi Kulshrestha; Marcos Oliveira; Marcos Oliveira; Orkut Karacalik; Denis Bonnay; Claudia Wagner; Orkut Karacalik; Denis Bonnay; Claudia Wagner
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    This anonymized data set consists of one month's (October 2018) web tracking data of 2,148 German users. For each user, the data contains the anonymized URL of the webpage the user visited, the domain of the webpage, category of the domain, which provides 41 distinct categories. In total, these 2,148 users made 9,151,243 URL visits, spanning 49,918 unique domains. For each user in our data set, we have self-reported information (collected via a survey) about their gender and age.

    We acknowledge the support of Respondi AG, which provided the web tracking and survey data free of charge for research purposes, with special thanks to François Erner and Luc Kalaora at Respondi for their insights and help with data extraction.

    The data set is analyzed in the following paper:

    • Kulshrestha, J., Oliveira, M., Karacalik, O., Bonnay, D., Wagner, C. "Web Routineness and Limits of Predictability: Investigating Demographic and Behavioral Differences Using Web Tracking Data." Proceedings of the International AAAI Conference on Web and Social Media. 2021. https://arxiv.org/abs/2012.15112.

    The code used to analyze the data is also available at https://github.com/gesiscss/web_tracking.

    If you use data or code from this repository, please cite the paper above and the Zenodo link.

    Users are advised that some domains in this data set may link to potentially questionable or inappropriate content. The domains have not been individually reviewed, as content verification was not the primary objective of this data set. Therefore, user discretion is strongly recommended when accessing or scraping any content from these domains.

  13. Data Processing & Hosting & Website Operating in Ireland - Market Research...

    • ibisworld.com
    Updated Apr 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    IBISWorld (2025). Data Processing & Hosting & Website Operating in Ireland - Market Research Report (2015-2030) [Dataset]. https://www.ibisworld.com/ireland/industry/data-processing-hosting-website-operating/200269/
    Explore at:
    Dataset updated
    Apr 15, 2025
    Dataset authored and provided by
    IBISWorld
    License

    https://www.ibisworld.com/about/termsofuse/https://www.ibisworld.com/about/termsofuse/

    Time period covered
    2015 - 2030
    Area covered
    Ireland
    Description

    This group includes the provision of infrastructure for hosting, data processing services and related activities, as well as search facilities and other portals for the Internet.

  14. Preferred web data aggregation frequency of data companies in 2011

    • statista.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista, Preferred web data aggregation frequency of data companies in 2011 [Dataset]. https://www.statista.com/statistics/220853/frequency-with-which-companies-would-like-to-collect-web-data/
    Explore at:
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jul 2011 - Sep 2011
    Area covered
    United States
    Description

    The survey illustrates the frequency with which companies would like to collect web data and web content as of September 2011. In 2011, ** percent of respondents stated that they aimed to collect customer data on a daily basis.

  15. d

    DATAANT | Custom Data Extraction | Web Scraping Data | Dataset, API | Data...

    • datarade.ai
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataant, DATAANT | Custom Data Extraction | Web Scraping Data | Dataset, API | Data Parsing and Processing | Worldwide [Dataset]. https://datarade.ai/data-products/dataant-custom-data-extraction-web-scraping-data-datase-dataant
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset authored and provided by
    Dataant
    Area covered
    Algeria, Bulgaria, Israel, Uruguay, Lithuania, Niger, Morocco, Vanuatu, Yemen, Andorra
    Description

    DATAANT provides the ability to extract data from any website using its web scraping service.

    Receive raw HTML data by triggering the API or request a custom dataset from any website.

    Use the received data for: - data analysis - data enrichment - data intelligence - data comparison

    The only two parameters needed to start a data extraction project: - data source (website URL) - attributes set for extraction

    All the data can be delivered using the following: - One-Time delivery - Scheduled updates delivery - DB access - API

    All the projects are highly customizable, so our team of data specialists could provide any data enrichment.

  16. w

    Web Data Commons - RDFa, Microdata, and Microformat Data Sets

    • webdatacommons.org
    n-quads
    Updated Oct 15, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christian Bizer; Robert Meusel; Anna Primpeli (2016). Web Data Commons - RDFa, Microdata, and Microformat Data Sets [Dataset]. http://webdatacommons.org/structureddata/2016-10/stats/stats.html
    Explore at:
    n-quadsAvailable download formats
    Dataset updated
    Oct 15, 2016
    Authors
    Christian Bizer; Robert Meusel; Anna Primpeli
    Description

    Microformat, Microdata and RDFa data from the October 2016 Common Crawl web corpus. We found structured data within 1.24 billion HTML pages out of the 3.2 billion pages contained in the crawl (38%). These pages originate from 5.63 million different pay-level-domains out of the 34 million pay-level-domains covered by the crawl (16.5%). Altogether, the extracted data sets consist of 44.2 billion RDF quads.

  17. Italy: privacy concerns regarding personal data on the internet, by issue

    • statista.com
    Updated Sep 14, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2018). Italy: privacy concerns regarding personal data on the internet, by issue [Dataset]. https://www.statista.com/statistics/830088/privacy-concerns-regarding-personal-data-on-the-internet-in-italy/
    Explore at:
    Dataset updated
    Sep 14, 2018
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jul 2016 - Aug 2016
    Area covered
    Italy
    Description

    This statistic displays the results of a survey on the share of individuals expressing privacy concerns regarding their personal data on the internet in Italy in 2016. During the survey period, it was found that **** percent of the respondents reported that the use of the internet exposes each one to be tracked and followed up while **** percent stated that privacy was not a real problem.

  18. Books to Scrape Dataset

    • kaggle.com
    • zenodo.org
    zip
    Updated Oct 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shahporan Priyom (2025). Books to Scrape Dataset [Dataset]. https://www.kaggle.com/datasets/shahporanpriyom/books-to-scrape-dataset
    Explore at:
    zip(24232 bytes)Available download formats
    Dataset updated
    Oct 1, 2025
    Authors
    Shahporan Priyom
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset was prepared as a beginner's guide to web scraping and data collection. The data is collected from Books to Scrape, a website designed for beginners to learn web scraping. A companion demonstrating how the data was scraped is given here

  19. f

    Fabulous | Web Hosting & Domain Names | Technology Data

    • datastore.forage.ai
    Updated Sep 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Fabulous | Web Hosting & Domain Names | Technology Data [Dataset]. https://datastore.forage.ai/searchresults/?resource_keyword=web
    Explore at:
    Dataset updated
    Sep 22, 2024
    Description

    Fabulous is a leading domain name registrar and developer of domain management tools. Founded by domainers for domainers, Fabulous offers competitive pricing on domain registration services, making it one of the world's cheapest registrars for professional domain owners. The company provides a comprehensive platform for domain registration, renewal, and transfer, including Whois privacy, domain monetization, parking management, and integrated sales channels.

    Fabulous also offers premium services such as a built-in sales network, parking program, and full reporting features. The company's domain management system is designed to help domainers maximize performance and financial return, with features such as bulk management tools, click tracking, and statistics reporting. With Fabulous, domain owners can register, manage, and monetize their domains effectively, making it an ideal choice for those in the domain industry.

  20. m

    Data for: Web data extraction from System Operator's public-access website...

    • data.mendeley.com
    Updated Feb 6, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Guzmán Díaz (2019). Data for: Web data extraction from System Operator's public-access website and its use in predicting and modeling the formation of the Spanish day-ahead electricity price [Dataset]. http://doi.org/10.17632/92rgb5cfhp.1
    Explore at:
    Dataset updated
    Feb 6, 2019
    Authors
    Guzmán Díaz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Raw table of ordered explanatory variables with autoregressive elements already calculated. This table was used to train the GBRT. Also the prediction results and the partial dependence analysis are provided.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
PredictLeads, Global Web Data | Web Scraping Data | Job Postings Data | Source: Company Website | 232M+ Records [Dataset]. https://datarade.ai/data-products/predictleads-web-data-web-scraping-data-job-postings-dat-predictleads

Global Web Data | Web Scraping Data | Job Postings Data | Source: Company Website | 232M+ Records

Explore at:
.jsonAvailable download formats
Dataset authored and provided by
PredictLeads
Area covered
Bosnia and Herzegovina, Guadeloupe, Comoros, French Guiana, Virgin Islands (British), Northern Mariana Islands, Kuwait, El Salvador, Bonaire, Kosovo
Description

PredictLeads Job Openings Data provides high-quality hiring insights sourced directly from company websites - not job boards. Using advanced web scraping technology, our dataset offers real-time access to job trends, salaries, and skills demand, making it a valuable resource for B2B sales, recruiting, investment analysis, and competitive intelligence.

Key Features:

✅232M+ Job Postings Tracked – Data sourced from 92 Million company websites worldwide. ✅7,1M+ Active Job Openings – Updated in real-time to reflect hiring demand. ✅Salary & Compensation Insights – Extract salary ranges, contract types, and job seniority levels. ✅Technology & Skill Tracking – Identify emerging tech trends and industry demands. ✅Company Data Enrichment – Link job postings to employer domains, firmographics, and growth signals. ✅Web Scraping Precision – Directly sourced from employer websites for unmatched accuracy.

Primary Attributes:

  • id (string, UUID) – Unique identifier for the job posting.
  • type (string, constant: "job_opening") – Object type.
  • title (string) – Job title.
  • description (string) – Full job description, extracted from the job listing.
  • url (string, URL) – Direct link to the job posting.
  • first_seen_at – Timestamp when the job was first detected.
  • last_seen_at – Timestamp when the job was last detected.
  • last_processed_at – Timestamp when the job data was last processed.

Job Metadata:

  • contract_types (array of strings) – Type of employment (e.g., "full time", "part time", "contract").
  • categories (array of strings) – Job categories (e.g., "engineering", "marketing").
  • seniority (string) – Seniority level of the job (e.g., "manager", "non_manager").
  • status (string) – Job status (e.g., "open", "closed").
  • language (string) – Language of the job posting.
  • location (string) – Full location details as listed in the job description.
  • Location Data (location_data) (array of objects)
  • city (string, nullable) – City where the job is located.
  • state (string, nullable) – State or region of the job location.
  • zip_code (string, nullable) – Postal/ZIP code.
  • country (string, nullable) – Country where the job is located.
  • region (string, nullable) – Broader geographical region.
  • continent (string, nullable) – Continent name.
  • fuzzy_match (boolean) – Indicates whether the location was inferred.

Salary Data (salary_data)

  • salary (string) – Salary range extracted from the job listing.
  • salary_low (float, nullable) – Minimum salary in original currency.
  • salary_high (float, nullable) – Maximum salary in original currency.
  • salary_currency (string, nullable) – Currency of the salary (e.g., "USD", "EUR").
  • salary_low_usd (float, nullable) – Converted minimum salary in USD.
  • salary_high_usd (float, nullable) – Converted maximum salary in USD.
  • salary_time_unit (string, nullable) – Time unit for the salary (e.g., "year", "month", "hour").

Occupational Data (onet_data) (object, nullable)

  • code (string, nullable) – ONET occupation code.
  • family (string, nullable) – Broad occupational family (e.g., "Computer and Mathematical").
  • occupation_name (string, nullable) – Official ONET occupation title.

Additional Attributes:

  • tags (array of strings, nullable) – Extracted skills and keywords (e.g., "Python", "JavaScript").

📌 Trusted by enterprises, recruiters, and investors for high-precision job market insights.

PredictLeads Dataset: https://docs.predictleads.com/v3/guide/job_openings_dataset

Search
Clear search
Close search
Google apps
Main menu