CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
List of 504,038 domains of Italy found to contain Google Analytics.
The front page for Italy-related domain names has been accessed through HTTPS or HTTP and analysed with webbkoll and jq to gather data about third-party requests, cookies and other privacy-invasive features. Together with the actual URL visited, the user/property ID is provided for 495,663 domains (extracted either from the cookies deposited or the URL of requests to Google Analytics). MX and TXT records for the domains are also provided.
The most common ID found was 23LNSPS7Q6, with over 35k domains calling it (seemingly associated with italiaonline.it). The most common responding IP addresses were 3 AWS IPv4 addresses (over 40k domains) and 2 CloudFlare IPv6 addresses (over 12k domains).
This statistic displays the share of data analytics firms rating domain knowledge as critically important across India in 2016, by market position. In that year, 100 percent of leading firms within the data analytics industry rated domain knowledge as being critically important for their business.
DomainIQ is a comprehensive global Domain Name dataset for organizations that want to build cyber security, data cleaning and email marketing applications. The dataset consists of the DNS records for over 267 million domains, updated daily, representing more than 90% of all public domains in the world.
The data is enriched by over thirty unique data points, including identifying the mailbox provider for each domain and using AI based predictive analytics to identify elevated risk domains from both a cyber security and email sending reputation perspective.
DomainIQ from Datazag offers layered intelligence through a highly flexible API and as a dataset, available for both cloud and on-premises applications. Standard formats include CSV, JSON, Parquet, and DuckDB.
Custom options are available for any other file or database format. With daily updates and constant research from Datazag, organizations can develop their own market leading cyber security, data cleaning and email marketing applications supported by comprehensive and accurate data from Datazag. Data updates available on a daily, weekly and monthly basis. API data is updated on a daily basis.
DataForSEO Labs API offers three powerful keyword research algorithms and historical keyword data:
• Related Keywords from the “searches related to” element of Google SERP. • Keyword Suggestions that match the specified seed keyword with additional words before, after, or within the seed key phrase. • Keyword Ideas that fall into the same category as specified seed keywords. • Historical Search Volume with current cost-per-click, and competition values.
Based on in-market categories of Google Ads, you can get keyword ideas from the relevant Categories For Domain and discover relevant Keywords For Categories. You can also obtain Top Google Searches with AdWords and Bing Ads metrics, product categories, and Google SERP data.
You will find well-rounded ways to scout the competitors:
• Domain Whois Overview with ranking and traffic info from organic and paid search. • Ranked Keywords that any domain or URL has positions for in SERP. • SERP Competitors and the rankings they hold for the keywords you specify. • Competitors Domain with a full overview of its rankings and traffic from organic and paid search. • Domain Intersection keywords for which both specified domains rank within the same SERPs. • Subdomains for the target domain you specify along with the ranking distribution across organic and paid search. • Relevant Pages of the specified domain with rankings and traffic data. • Domain Rank Overview with ranking and traffic data from organic and paid search. • Historical Rank Overview with historical data on rankings and traffic of the specified domain from organic and paid search. • Page Intersection keywords for which the specified pages rank within the same SERP.
All DataForSEO Labs API endpoints function in the Live mode. This means you will be provided with the results in response right after sending the necessary parameters with a POST request.
The limit is 2000 API calls per minute, however, you can contact our support team if your project requires higher rates.
We offer well-rounded API documentation, GUI for API usage control, comprehensive client libraries for different programming languages, free sandbox API testing, ad hoc integration, and deployment support.
We have a pay-as-you-go pricing model. You simply add funds to your account and use them to get data. The account balance doesn't expire.
https://whoisfreaks.com/termshttps://whoisfreaks.com/terms
The expiring and deleted domains statistics cover both generic top-level domains (gTLDs) and country-code top-level domains (ccTLDs). This dataset helps you stay up to date and make data-driven decisions in the domain industry based on daily updates.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Major US Open Data Domains’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/98e060dc-3da0-45e9-bf33-4a37a98ded89 on 27 January 2022.
--- Dataset description provided by original source is as follows ---
An incomplete collection of open data domains throughout the U.S. (intended for comparison with King County open data)
--- Original source retains full ownership of the source dataset ---
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset contains DNS records, IP-related features, WHOIS/RDAP information, information from TLS handshakes and certificates, and GeoIP information for 368,956 benign domains from Cisco Umbrella, 461,338 benign domains from the actual CESNET network traffic, 164,425 phishing domains from PhishTank and OpenPhish services, and 100,809 malware domains from various sources like ThreatFox, The Firebog, MISP threat intelligence platform, and other sources. The ground truth for the phishing dataset was double-check with the VirusTotal (VT) service. Domain names not considered malicious by VT have been removed from phishing and malware datasets. Similarly, benign domain names that were considered risky by VT have been removed from the benign datasets. The data was collected between March 2023 and July 2024. The final assessment of the data was conducted in August 2024.
The dataset is useful for cybersecurity research, e.g. statistical analysis of domain data or feature extraction for training machine learning-based classifiers, e.g. for phishing and malware website detection.
The dataset was created using software available in the associated GitHub repository nesfit/domainradar-dib.
The data is located in the following individual files:
Both files contain a JSON array of records generated using mongoexport (in the MongoDB Extended JSON (v2) format in Relaxed Mode). The following table documents the structure of a record. Please note that:
Field name |
Field type |
Nullable |
Description |
domain_name |
String |
No |
The evaluated domain name |
url |
String |
No |
The source URL for the domain name |
evaluated_on |
Date |
No |
Date of last collection attempt |
source |
String |
No |
An identifier of the source |
sourced_on |
Date |
No |
Date of ingestion of the domain name |
dns |
Object |
Yes |
Data from DNS scan |
rdap |
Object |
Yes |
Data from RDAP or WHOIS |
tls |
Object |
Yes |
Data from TLS handshake |
ip_data |
Array of Objects |
Yes |
Array of data objects capturing the IP addresses related to the domain name |
malware_type |
String |
No |
The malware type/family or “unknown” (only present in malware.json) |
DNS data (dns field) | |||
A |
Array of Strings |
No |
Array of IPv4 addresses |
AAAA |
Array of Strings |
No |
Array of IPv6 addresses |
TXT |
Array of Strings |
No |
Array of raw TXT values |
CNAME |
Object |
No |
The CNAME target and related IPs |
MX |
Array of Objects |
No |
Array of objects with the MX target hostname, priority and related IPs |
NS |
Array of Objects |
No |
Array of objects with the NS target hostname and related IPs |
SOA |
Object |
No |
All the SOA fields, present if found at the target domain name |
zone_SOA |
Object |
No |
The SOA fields of the target’s zone (closest point of delegation), present if found and not a record in the target domain directly |
dnssec |
Object |
No |
Flags describing the DNSSEC validation result for each record type |
ttls |
Object |
No |
The TTL values for each record type |
remarks |
Object |
No |
The zone domain name and DNSSEC flags |
RDAP data (rdap field) | |||
copyright_notice |
String |
No |
RDAP/WHOIS data usage copyright notice |
dnssec |
Bool |
No |
DNSSEC presence flag |
entitites |
Object |
No |
An object with various arrays representing the found related entity types (e.g. abuse, admin, registrant). The arrays contain objects describing the individual entities. |
expiration_date |
Date |
Yes |
The current date of expiration |
handle |
String |
No |
RDAP handle |
last_changed_date |
Date |
Yes |
The date when the domain was last changed |
name |
String |
No |
PredictLeads Key Customers Data offers a critical technical resource for B2B operations, focusing on capturing detailed insights about business relationships directly from company websites. By leveraging advanced web scraping technologies and innovative logo data recognition, we provide extensive Domain Name Data, Logo Data, Company Data, and Business Website Data. This dataset is crucial for executing sophisticated Sentiment Analysis, creating a 360-degree Customer View, enhancing Account Profiling, conducting in-depth Company Analysis, and supporting comprehensive Analytics.
Key Technical Features for B2B Operations:
➡️ Advanced Web Scraping and Logo Data Techniques: PredictLeads employs cutting-edge technologies to detect and analyze key customers represented through logos and mentions on business websites, including case studies and partner pages. ➡️ Rich Domain Name and Company Data: Access detailed information on business relationships and company affiliations that are crucial for analyzing market positions and influence. ➡️ Comprehensive Business Website Data: Utilize data gathered from company websites to gain insights into their operational networks, partnerships, and customer relationships.
Enhancing B2B Strategies with PredictLeads Data:
➡️ 360-Degree Customer Views: Develop comprehensive views of your customers by integrating detailed key customers data, revealing not just direct relationships but also extended networks. ➡️ Account Profiling: Enhance your account profiling efforts by using our connections data to understand the breadth and depth of a company's market engagements and partnerships. ➡️ Sentiment Analysis: Apply sentiment analysis techniques to the data collected from business websites and news sources to assess the sentiment surrounding business relationships and market moves. ➡️ Company Analysis: Leverage our detailed company and business website data to perform in-depth analyses of company strategies, growth potential, and market influence. ➡️ Advanced Analytics: Utilize our comprehensive dataset in your B2B data cleansing processes and analytical models to ensure data accuracy and relevancy in your CRM and marketing automation platforms.
Strategic Technical Applications in B2B:
➡️ Informed Decision-Making: Empower your technical teams with data that highlights strategic key customers and market dynamics, enhancing strategic initiatives and business outcomes. ➡️ Enhanced Data Reliability for Technical Operations: Our rigorous data collection and validation processes ensure you work with the most reliable and relevant data, supporting critical assessments and business operations. ➡️ Competitive and Market Analysis: Utilize our comprehensive data to conduct detailed analyses of competitors and market trends, providing a strategic edge in planning and execution.
Why PredictLeads Key Customers Data is Essential for Technical B2B Teams:
✅ Designed for Technical Precision: Our solutions are meticulously crafted to meet the specific needs of technical teams, offering unparalleled depth and applicability. ✅ Up-to-Date and Comprehensive: Continuous updates and broad coverage ensure that our key customers data captures the dynamic nature of global business environments, providing timely and essential insights. ✅ Trusted by Industry Leaders: Recognized for its robust data architecture and precision, PredictLeads is relied upon by technical analysts and data scientists across industries to guide their strategy and operations.
PredictLeads Key Customers Data is a tool for B2B organizations that rely on deep technical insights to steer their strategic and operational directives. By integrating the key customers data into your systems, you enhance your capacity for informed decision-making, ensuring robust technical operations and strategic advantage in a competitive marketplace.
E-commerce companies measure the interactions of online shoppers with products or services throughout the entire shopping experience. As of June 2023, Google's plug-in was the most used e-commerce analytics technology, being active on over 61,000 e-commerce sites worldwide. CM Commerce and AddShoppers followed in the ranking, with 5,396 and 4,818 domains, respectively.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Manually labeled 555 metamodels mined from GitHub in April 2017.
Domains: (1) bibliography, (2) conference management, (3) bug/issue tracker, (4) build systems, (5) document/office products, (6) requirement/use case, (7) database/sql, (8) state machines, (9) petri nets
Procedure for constructing the dataset: fully manual, by searching for certain keywords and regexes (e.g. "state" and "transition" for state machines) in the metamodels and inspecting the results for inclusion.
Format for the file names: ABSINDEX_CLUSTER_ITEMINDEX_name_hash.ecore
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Scientific and related management challenges in the water domain require synthesis of data from multiple domains. Many data analysis tasks are difficult because datasets are large and complex; standard formats for data types are not always agreed upon nor mapped to an efficient structure for analysis; water scientists may lack training in methods needed to efficiently tackle large and complex datasets; and available tools can make it difficult to share, collaborate around, and reproduce scientific work. Overcoming these barriers to accessing, organizing, and preparing datasets for analyses will be an enabler for transforming scientific inquiries. Building on the HydroShare repository’s established cyberinfrastructure, we have advanced two packages for the Python language that make data loading, organization, and curation for analysis easier, reducing time spent in choosing appropriate data structures and writing code to ingest data. These packages enable automated retrieval of data from HydroShare and the USGS’s National Water Information System (NWIS), loading of data into performant structures keyed to specific scientific data types and that integrate with existing visualization, analysis, and data science capabilities available in Python, and then writing analysis results back to HydroShare for sharing and eventual publication. These capabilities reduce the technical burden for scientists associated with creating a computational environment for executing analyses by installing and maintaining the packages within CUAHSI’s HydroShare-linked JupyterHub server. HydroShare users can leverage these tools to build, share, and publish more reproducible scientific workflows. The HydroShare Python Client and USGS NWIS Data Retrieval packages can be installed within a Python environment on any computer running Microsoft Windows, Apple MacOS, or Linux from the Python Package Index using the PIP utility. They can also be used online via the CUAHSI JupyterHub server (https://jupyterhub.cuahsi.org/) or other Python notebook environments like Google Collaboratory (https://colab.research.google.com/). Source code, documentation, and examples for the software are freely available in GitHub at https://github.com/hydroshare/hsclient/ and https://github.com/USGS-python/dataretrieval.
This presentation was delivered as part of the Hawai'i Data Science Institute's regular seminar series: https://datascience.hawaii.edu/event/data-science-and-analytics-for-water/
International Journal of Data Science and Analytics Acceptance Rate - ResearchHelpDesk - International Journal of Data Science and Analytics - Data Science has been established as an important emergent scientific field and paradigm driving research evolution in such disciplines as statistics, computing science and intelligence science, and practical transformation in such domains as science, engineering, the public sector, business, social science, and lifestyle. The field encompasses the larger areas of artificial intelligence, data analytics, machine learning, pattern recognition, natural language understanding, and big data manipulation. It also tackles related new scientific challenges, ranging from data capture, creation, storage, retrieval, sharing, analysis, optimization, and visualization, to integrative analysis across heterogeneous and interdependent complex resources for better decision-making, collaboration, and, ultimately, value creation. The International Journal of Data Science and Analytics (JDSA) brings together thought leaders, researchers, industry practitioners, and potential users of data science and analytics, to develop the field, discuss new trends and opportunities, exchange ideas and practices, and promote transdisciplinary and cross-domain collaborations.
This asset is a filter (derived view of a dataset) based on the system dataset, 'Site Analytics: Referrers' which is automatically generated by the City of Austin Open Data Portal (data.austintexas.gov). A referrer is the previous webpage a user was on when following a link to this domain. This dataset provides referrer information by date, referring domain (which specific domains users were on), and name of the asset the user was sent to. The dataset will reflect new Referrer records within a day of when they occur.
Data provided by: Tyler Technologies Creation date of data source: May 21, 2021
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
CoDIAC-derived data is included in this archive, including the reference human SH2-ome, annotated structures from PDB and AlphaFold, post-translational modifications, contact map features, mutations, etc.
Geologic domains for the northern Gulf of Mexico derived using the Subsurface Trend Analysis (STA) method. The domains were postulated using geologic province, lithologic, and structural information and validated using statistical methods. Publication detailing the STA method: Rose, K., Bauer, J.R., and Mark-Moser, M. (2020) Subsurface trend analysis, a multi-variate geospatial approach for subsurface evaluation and uncertainty reduction, Interpretation, vol. 8, issue 1 https://library.seg.org/doi/abs/10.1190/int-2019-0019.1 Detailed discussion of domain formation and analysis: Mark-Moser, M., Miller, R., Bauer, J., Rose, K., and C. Disenhof. 2018, Analysis of Subsurface Reservoir Properties Using a Novel Geospatial Approach, Offshore Gulf of Mexico. NETL-TRS-2018 https://edx.netl.doe.gov/dataset/detailed-analysis-of-geospatial-trends-of-hydrocarbon-accumulations-offshore-gulf-of-mexico
In computer security, network botnets still represent a major cyber threat. Concealing techniques such as the dynamic addressing and the Domain Name Generation Algorithms (DGAs) require an improved and more effective detection process. To this extent, this data descriptor presents a collection of over 30 million manually-labelled algorithmically generated domain names decorated with a feature set ready-to-use for Machine Learning analysis. This proposed data set enables researchers to move forward the data collection, organization and pre-processing phases, eventually enabling them to focus on the analysis and the production of Machine-Learning powered solutions for network intrusion detection.
50 among the most important malware variants have been selected. Each family is available both as list of domains and as collection of features. To be more precise, the former is generated by executing the malware DGAs in a controlled environment with fixed parameters, while the latter is generated by extracting a combination of statistical and Natural Language Processing (NLP) metrics.
Zago, Mattia; Gil Pérez, Manuel; Martinez Perez, Gregorio (2020), “UMUDGA - University of Murcia Domain Generation Algorithm Dataset”, Mendeley Data, V1, doi: 10.17632/y8ph45msv8.1
This digital dataset contains the 9 major areas used to subdivide the Central Valley for the interpolation of the percentage of coarse-grained deposits into the texture model. This texture model was used as input data for the hydraulic properties portion of the Central Valley Hydrologic Model (CVHM). The Central Valley encompasses an approximate 50,000 square-kilometer region of California. The complex hydrologic system of the Central Valley is simulated using the USGS numerical modeling code MODFLOW-FMP (Schmid and others, 2006). This simulation is referred to here as the CVHM (Faunt, 2009). Utilizing MODFLOW-FMP, the CVHM simulates groundwater and surface-water flow, irrigated agriculture, land subsidence, and other key processes in the Central Valley on a monthly basis from 1961-2003. The total active modeled area is 20,334 square-miles on a finite difference grid comprising 441 rows and 98 columns. Slightly less than 50 percent of the cells are active. The CVHM model grid has a uniform horizontal discretization of 1x1 square mile and is oriented parallel to the valley axis, 34 degrees west of north (Faunt, 2009). In order to better characterize the aquifer-system deposits, lithologic data from approximately 8,500 drillers' logs of boreholes ranging in depth from 12 to 3,000 feet below land surface were compiled and analyzed. The percentage of coarse-grained sediment, or texture, then was computed for each 50-foot depth interval of the drillers' logs. A 3-dimensional texture model was developed by interpolating the percentage of coarse-grained deposits onto a 1-mile spatial grid at 50-foot-depth intervals from land surface to 2,800 feet below land surface. The CVHM is the most recent regional-scale model of the Central Valley developed by the U.S. Geological Survey (USGS). The CVHM was developed as part of the USGS Groundwater Resources Program (see "Foreword", Chapter A, page iii, for details).
In 2021, AI was most applied to BI and data analytics, computer vision, data exploration, and NLP in Poland.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Enhanced Microsoft Academic Knowledge Graph (EMAKG) is a large dataset of scientific publications and related entities, including authors, institutions, journals, conferences, and fields of study. The proposed dataset originates from the Microsoft Academic Knowledge Graph (MAKG), one of the most extensive freely available knowledge graphs of scholarly data. To build the dataset, we first assessed the limitations of the current MAKG. Then, based on these, several methods were designed to enhance data and facilitate the number of use case scenarios, particularly in mobility and network analysis. EMAKG provides two main advantages: It has improved usability, facilitating access to non-expert users It includes an increased number of types of information obtained by integrating various datasets and sources, which help expand the application domains. For instance, geographical information could help mobility and migration research. The knowledge graph completeness is improved by retrieving and merging information on publications and other entities no longer available in the latest version of MAKG. Furthermore, geographical and collaboration networks details are employed to provide data on authors as well as their annual locations and career nationalities, together with worldwide yearly stocks and flows. Among others, the dataset also includes: fields of study (and publications) labelled by their discipline(s); abstracts and linguistic features, i.e., standard language codes, tokens , and types entities’ general information, e.g., date of foundation and type of institutions; and academia related metrics, i.e., h-index. The resulting dataset maintains all the characteristics of the parent datasets and includes a set of additional subsets and data that can be used for new case studies relating to network analysis, knowledge exchange, linguistics, computational linguistics, and mobility and human migration, among others.
This asset is a filter (derived view of a dataset) based on the system dataset, 'Site Analytics: Catalog Search Terms' which is automatically generated by the City of Austin Open Data Portal (data.austintexas.gov). It provides data on the words and phrases entered by site users of in search bars that look through the data catalog for relevant information. Catalog searches using the Discovery API are not included.
Each row in the dataset indicates the number of catalog searches made using the search term from the specified user segment during the noted hour.
Data are segmented into the following user types: • site member: users who have logged in and have been granted a role on the domain • community user: users who have logged in but do not have a role on the domain • anonymous: users who have not logged in to the domain
Data are updated by a system process at least once a day, if there is new data to record.
Data provided by: Tyler Technologies Creation date of data source: January 31, 2020
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
List of 504,038 domains of Italy found to contain Google Analytics.
The front page for Italy-related domain names has been accessed through HTTPS or HTTP and analysed with webbkoll and jq to gather data about third-party requests, cookies and other privacy-invasive features. Together with the actual URL visited, the user/property ID is provided for 495,663 domains (extracted either from the cookies deposited or the URL of requests to Google Analytics). MX and TXT records for the domains are also provided.
The most common ID found was 23LNSPS7Q6, with over 35k domains calling it (seemingly associated with italiaonline.it). The most common responding IP addresses were 3 AWS IPv4 addresses (over 40k domains) and 2 CloudFlare IPv6 addresses (over 12k domains).