NOTE: To review the latest plan, make sure to filter the "Report Year" column to the latest year.
Data on public websites maintained by or on behalf of the city agencies.
This data about nola.gov provides a window into how people are interacting with the the City of New Orleans online. The data comes from a unified Google Analytics account for New Orleans. We do not track individuals and we anonymize the IP addresses of all visitors.
PredictLeads Key Customers Data provides essential business intelligence by analyzing company relationships, uncovering vendor partnerships, client connections, and strategic affiliations through advanced web scraping and logo recognition. This dataset captures business interactions directly from company websites, offering valuable insights into market positioning, competitive landscapes, and growth opportunities.
Use Cases:
✅ Account Profiling – Gain a 360-degree customer view by mapping company relationships and partnerships. ✅ Competitive Intelligence – Track vendor-client connections and business affiliations to identify key industry players. ✅ B2B Lead Targeting – Prioritize leads based on their business relationships, improving sales and marketing efficiency. ✅ CRM Data Enrichment – Enhance company records with detailed key customer data, ensuring data accuracy. ✅ Market Research – Identify emerging trends and industry networks to optimize strategic planning.
Key API Attributes:
📌 PredictLeads Key Customers Data is an indispensable tool for B2B sales, marketing, and market intelligence teams, providing actionable relationship insights to drive targeted outreach, competitor tracking, and strategic decision-making.
PredictLeads Docs: https://docs.predictleads.com/v3/guide/connections_dataset
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
This Website Statistics dataset has four resources showing usage of the Lincolnshire Open Data website. Web analytics terms used in each resource are defined in their accompanying Metadata file.
Website Usage Statistics: This document shows a statistical summary of usage of the Lincolnshire Open Data site for the latest calendar year.
Website Statistics Summary: This dataset shows a website statistics summary for the Lincolnshire Open Data site for the latest calendar year.
Webpage Statistics: This dataset shows statistics for individual Webpages on the Lincolnshire Open Data site by calendar year.
Dataset Statistics: This dataset shows cumulative totals for Datasets on the Lincolnshire Open Data site that have also been published on the national Open Data site Data.Gov.UK - see the Source link.
Note: Website and Webpage statistics (the first three resources above) show only UK users, and exclude API calls (automated requests for datasets). The Dataset Statistics are confined to users with javascript enabled, which excludes web crawlers and API calls.
These Website Statistics resources are updated annually in January by the Lincolnshire County Council Business Intelligence team. For any enquiries about the information contact opendata@lincolnshire.gov.uk.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Code:
Packet_Features_Generator.py & Features.py
To run this code:
pkt_features.py [-h] -i TXTFILE [-x X] [-y Y] [-z Z] [-ml] [-s S] -j
-h, --help show this help message and exit -i TXTFILE input text file -x X Add first X number of total packets as features. -y Y Add first Y number of negative packets as features. -z Z Add first Z number of positive packets as features. -ml Output to text file all websites in the format of websiteNumber1,feature1,feature2,... -s S Generate samples using size s. -j
Purpose:
Turns a text file containing lists of incomeing and outgoing network packet sizes into separate website objects with associative features.
Uses Features.py to calcualte the features.
startMachineLearning.sh & machineLearning.py
To run this code:
bash startMachineLearning.sh
This code then runs machineLearning.py in a tmux session with the nessisary file paths and flags
Options (to be edited within this file):
--evaluate-only to test 5 fold cross validation accuracy
--test-scaling-normalization to test 6 different combinations of scalers and normalizers
Note: once the best combination is determined, it should be added to the data_preprocessing function in machineLearning.py for future use
--grid-search to test the best grid search hyperparameters - note: the possible hyperparameters must be added to train_model under 'if not evaluateOnly:' - once best hyperparameters are determined, add them to train_model under 'if evaluateOnly:'
Purpose:
Using the .ml file generated by Packet_Features_Generator.py & Features.py, this program trains a RandomForest Classifier on the provided data and provides results using cross validation. These results include the best scaling and normailzation options for each data set as well as the best grid search hyperparameters based on the provided ranges.
Data
Encrypted network traffic was collected on an isolated computer visiting different Wikipedia and New York Times articles, different Google search queres (collected in the form of their autocomplete results and their results page), and different actions taken on a Virtual Reality head set.
Data for this experiment was stored and analyzed in the form of a txt file for each experiment which contains:
First number is a classification number to denote what website, query, or vr action is taking place.
The remaining numbers in each line denote:
The size of a packet,
and the direction it is traveling.
negative numbers denote incoming packets
positive numbers denote outgoing packets
Figure 4 Data
This data uses specific lines from the Virtual Reality.txt file.
The action 'LongText Search' refers to a user searching for "Saint Basils Cathedral" with text in the Wander app.
The action 'ShortText Search' refers to a user searching for "Mexico" with text in the Wander app.
The .xlsx and .csv file are identical
Each file includes (from right to left):
The origional packet data,
each line of data organized from smallest to largest packet size in order to calculate the mean and standard deviation of each packet capture,
and the final Cumulative Distrubution Function (CDF) caluclation that generated the Figure 4 Graph.
PredictLeads Job Openings Data provides high-quality hiring insights sourced directly from company websites - not job boards. Using advanced web scraping technology, our dataset offers real-time access to job trends, salaries, and skills demand, making it a valuable resource for B2B sales, recruiting, investment analysis, and competitive intelligence.
Key Features:
✅214M+ Job Postings Tracked – Data sourced from 92 Million company websites worldwide. ✅7,1M+ Active Job Openings – Updated in real-time to reflect hiring demand. ✅Salary & Compensation Insights – Extract salary ranges, contract types, and job seniority levels. ✅Technology & Skill Tracking – Identify emerging tech trends and industry demands. ✅Company Data Enrichment – Link job postings to employer domains, firmographics, and growth signals. ✅Web Scraping Precision – Directly sourced from employer websites for unmatched accuracy.
Primary Attributes:
Job Metadata:
Salary Data (salary_data)
Occupational Data (onet_data) (object, nullable)
Additional Attributes:
📌 Trusted by enterprises, recruiters, and investors for high-precision job market insights.
PredictLeads Dataset: https://docs.predictleads.com/v3/guide/job_openings_dataset
https://webtechsurvey.com/termshttps://webtechsurvey.com/terms
A complete list of live websites using the data-urls technology, compiled through global website indexing conducted by WebTechSurvey.
Dataset Card for techchefz-website-data-v8
This dataset has been created with Argilla. As shown in the sections below, this dataset can be loaded into your Argilla server as explained in Load with Argilla, or used directly with the datasets library in Load with datasets.
Using this dataset with Argilla
To load with Argilla, you'll just need to install Argilla as pip install argilla --upgrade and then use the following code: import argilla as rg
ds =… See the full description on the dataset page: https://huggingface.co/datasets/Shashwat13333/techchefz-website-data-v8.
"Website allows the public full access to the 1950 Census images, census maps and descriptions.
Business Software Alliance is a trade association that represents the world's leading software companies, including Autodesk, IBM, and Symantec. The organization's members are committed to promoting the use of legitimate software and ensuring the integrity of their intellectual property.
As a result, the data housed on BSA's website is rich in information related to the software industry, including software licensing, anti-piracy efforts, and digital piracy statistics. The data includes information on software usage, software development, and the impact of piracy on the technology industry. With its focus on promoting legitimate software use, the data on BSA's website provides valuable insights into the global software industry.
This survey is for you to let us know why you came to our website, if you found what you were looking for, and if there is anything we can do to improve our data selection.
The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly, reaching *** zettabytes in 2024. Over the next five years up to 2028, global data creation is projected to grow to more than *** zettabytes. In 2020, the amount of data created and replicated reached a new high. The growth was higher than previously expected, caused by the increased demand due to the COVID-19 pandemic, as more people worked and learned from home and used home entertainment options more often. Storage capacity also growing Only a small percentage of this newly created data is kept though, as just * percent of the data produced and consumed in 2020 was saved and retained into 2021. In line with the strong growth of the data volume, the installed base of storage capacity is forecast to increase, growing at a compound annual growth rate of **** percent over the forecast period from 2020 to 2025. In 2020, the installed base of storage capacity reached *** zettabytes.
Updates to Website: (Please add new items at the top of this description with the date of the website change) May 9, 2012: Uploaded experimental data in matlab format for HIRENASD November 8, 2011: New grids, experimental data for HIRENASD configuration, new FEM for HIRENASD configuration. (JHeeg) Oct 13: Uploaded BSCW grids (VGRID) (PChwalowski) Oct 5: Added HIRENASD experimental data for test points #159 and #132 (JHeeg, PChwalowski)
https://www.ibisworld.com/about/termsofuse/https://www.ibisworld.com/about/termsofuse/
This group includes the provision of infrastructure for hosting, data processing services and related activities, as well as search facilities and other portals for the Internet.
https://www.ibisworld.com/about/termsofuse/https://www.ibisworld.com/about/termsofuse/
This group includes the provision of infrastructure for hosting, data processing services and related activities, as well as search facilities and other portals for the Internet.
Product provided by Wappalyzer. Instant access to website technology stacks.
Lookup API Perform near-instant technology lookups with the Lookup API. Results are fetched from our comprehensive database of millions of websites. If we haven't seen a domain before, we'll index it immediately and report back within minutes.
https://www.ibisworld.com/about/termsofuse/https://www.ibisworld.com/about/termsofuse/
This group includes the provision of infrastructure for hosting, data processing services and related activities, as well as search facilities and other portals for the Internet.
Set of data (cualitative and cuantitative parameters) and screenshots of Web pages from all countries in the world
The United States Fish and Wildlife Service (USFWS) National Wild Fish Health Survey Database (NWFHSDb) has been available to the public since September 2001. The database contains data on pathogen occurrence in free-ranging (wild) populations of fish. This data is collected via the National Wild Fish Health Survey, initiated in 1996 as a collaborative effort among natural resource agencies. The survey is maintained and managed by the nine USFWS National Fish Health Centers. The database is part of an effort to create an information system that will be a valuable tool for the management, protection, and recovery of aquatic ecosystems. The NWFHSDb consists of two distinct components: an internal database maintained and utilized by the Fish Health Centers for entering, tracking, and reporting data, and this publicly accessible website. Data from each Fish Health Center is available on this site for display and download. The NWFHSDb displays pathogen distribution information and is based on the spatial data generated by the Fish Health Centers. The NWFHSDb itself is a geographic information system (GIS) designed to be accessed via a web browser. It offers users the ability to obtain maps of NWFHS data based on user-defined queries. Individual case reports are available for each record and search results may be downloaded in several formats for further analysis. The feature layer and related tables contain data from 2021-present.
Depending on data integration issues, data may not be complete. Please reach out to the identified Fish Health Center for questions or more information.
According to a May 2023 survey of internet users in the United States, more than 80 percent of social media users classified as most knowledgeable in data privacy and cybersecurity topics had the experience of changing the privacy settings on their social media accounts. Among the users who ranked as least familiar with digital privacy topics, around 50 percent modified the privacy settings of their social media accounts. Furthermore, most knowledgeable users were more likely to turn off cookies or website tracking.
NOTE: To review the latest plan, make sure to filter the "Report Year" column to the latest year.
Data on public websites maintained by or on behalf of the city agencies.