100+ datasets found

Z
Web requests analysis of Italy websites which use Google Analytics
data.niaid.nih.gov
zenodo.org
Updated Aug 9, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Leva, Federico (2022). Web requests analysis of Italy websites which use Google Analytics [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6793112
Explore at:
Dataset updated
Aug 9, 2022
Dataset authored and provided by
Leva, Federico
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Italy
Description
List of 504,038 domains of Italy found to contain Google Analytics.

The front page for Italy-related domain names has been accessed through HTTPS or HTTP and analysed with webbkoll and jq to gather data about third-party requests, cookies and other privacy-invasive features. Together with the actual URL visited, the user/property ID is provided for 495,663 domains (extracted either from the cookies deposited or the URL of requests to Google Analytics). MX and TXT records for the domains are also provided.

The most common ID found was 23LNSPS7Q6, with over 35k domains calling it (seemingly associated with italiaonline.it). The most common responding IP addresses were 3 AWS IPv4 addresses (over 40k domains) and 2 CloudFlare IPv6 addresses (over 12k domains).
Share of analytics firms that find domain knowledge important in India 2016
statista.com
Updated Feb 17, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2021). Share of analytics firms that find domain knowledge important in India 2016 [Dataset]. https://www.statista.com/statistics/871511/india-share-of-analytics-firms-rating-domain-knowledge-as-important/
Explore at:
Dataset updated
Feb 17, 2021
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2016
Area covered
India
Description
This statistic displays the share of data analytics firms rating domain knowledge as critically important across India in 2016, by market position. In that year, 100 percent of leading firms within the data analytics industry rated domain knowledge as being critically important for their business.
d
Global Domain Name Data | DNS and Risk Classification via Dataset & API |...
datarade.ai
.csv, .json
Updated Nov 2, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datazag (2024). Global Domain Name Data | DNS and Risk Classification via Dataset & API | 267M+ Domains Covering Over 1570 Domain Zones | Updated Daily [Dataset]. https://datarade.ai/data-products/datazag-global-domain-name-data-dns-and-risk-classificatio-datazag
Explore at:
.csv, .jsonAvailable download formats
Dataset updated
Nov 2, 2024
Dataset authored and provided by
Datazag
Area covered
Lesotho, Bahamas, Marshall Islands, Kenya, Dominica, Norway, State of, Niue, Gambia, Paraguay
Description
DomainIQ is a comprehensive global Domain Name dataset for organizations that want to build cyber security, data cleaning and email marketing applications. The dataset consists of the DNS records for over 267 million domains, updated daily, representing more than 90% of all public domains in the world.

The data is enriched by over thirty unique data points, including identifying the mailbox provider for each domain and using AI based predictive analytics to identify elevated risk domains from both a cyber security and email sending reputation perspective.

DomainIQ from Datazag offers layered intelligence through a highly flexible API and as a dataset, available for both cloud and on-premises applications. Standard formats include CSV, JSON, Parquet, and DuckDB.

Custom options are available for any other file or database format. With daily updates and constant research from Datazag, organizations can develop their own market leading cyber security, data cleaning and email marketing applications supported by comprehensive and accurate data from Datazag. Data updates available on a daily, weekly and monthly basis. API data is updated on a daily basis.
DataForSEO Labs API for keyword research and search analytics, real-time...
datarade.ai
.json
Updated Jun 4, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DataForSEO (2021). DataForSEO Labs API for keyword research and search analytics, real-time data for all Google locations and languages [Dataset]. https://datarade.ai/data-products/dataforseo-labs-api-for-keyword-research-and-search-analytics-dataforseo
Explore at:
.jsonAvailable download formats
Dataset updated
Jun 4, 2021
Dataset provided by
Authors
DataForSEO
Area covered
Korea (Democratic People's Republic of), Kenya, Azerbaijan, Isle of Man, Armenia, Mauritania, Micronesia (Federated States of), Tokelau, Cocos (Keeling) Islands, Morocco
Description
DataForSEO Labs API offers three powerful keyword research algorithms and historical keyword data:

• Related Keywords from the “searches related to” element of Google SERP. • Keyword Suggestions that match the specified seed keyword with additional words before, after, or within the seed key phrase. • Keyword Ideas that fall into the same category as specified seed keywords. • Historical Search Volume with current cost-per-click, and competition values.

Based on in-market categories of Google Ads, you can get keyword ideas from the relevant Categories For Domain and discover relevant Keywords For Categories. You can also obtain Top Google Searches with AdWords and Bing Ads metrics, product categories, and Google SERP data.

You will find well-rounded ways to scout the competitors:

• Domain Whois Overview with ranking and traffic info from organic and paid search. • Ranked Keywords that any domain or URL has positions for in SERP. • SERP Competitors and the rankings they hold for the keywords you specify. • Competitors Domain with a full overview of its rankings and traffic from organic and paid search. • Domain Intersection keywords for which both specified domains rank within the same SERPs. • Subdomains for the target domain you specify along with the ranking distribution across organic and paid search. • Relevant Pages of the specified domain with rankings and traffic data. • Domain Rank Overview with ranking and traffic data from organic and paid search. • Historical Rank Overview with historical data on rankings and traffic of the specified domain from organic and paid search. • Page Intersection keywords for which the specified pages rank within the same SERP.

All DataForSEO Labs API endpoints function in the Live mode. This means you will be provided with the results in response right after sending the necessary parameters with a POST request.

The limit is 2000 API calls per minute, however, you can contact our support team if your project requires higher rates.

We offer well-rounded API documentation, GUI for API usage control, comprehensive client libraries for different programming languages, free sandbox API testing, ad hoc integration, and deployment support.

We have a pay-as-you-go pricing model. You simply add funds to your account and use them to get data. The account balance doesn't expire.
w
Expiring and Deleted Domains Stats
whoisfreaks.com
Updated Oct 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
WhoisFreaks (2024). Expiring and Deleted Domains Stats [Dataset]. https://whoisfreaks.com/products/expiring-dropped-domains
Explore at:
Dataset updated
Oct 10, 2024
Dataset authored and provided by
WhoisFreaks
License
https://whoisfreaks.com/termshttps://whoisfreaks.com/terms
Time period covered
Mar 19, 2025 - Mar 26, 2025
Area covered
Lahore, Pakistan
Description
The expiring and deleted domains statistics cover both generic top-level domains (gTLDs) and country-code top-level domains (ccTLDs). This dataset helps you stay up to date and make data-driven decisions in the domain industry based on daily updates.
A
‘Major US Open Data Domains’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 27, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Major US Open Data Domains’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/data-gov-major-us-open-data-domains-640b/7461f511/?iid=003-707&v=presentation
Explore at:
Dataset updated
Jan 27, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
United States
Description
Analysis of ‘Major US Open Data Domains’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/98e060dc-3da0-45e9-bf33-4a37a98ded89 on 27 January 2022.

--- Dataset description provided by original source is as follows ---

An incomplete collection of open data domains throughout the U.S. (intended for comparison with King County open data)

--- Original source retains full ownership of the source dataset ---

A Dataset of Information (DNS, IP, WHOIS/RDAP, TLS, GeoIP) for a Large...

zenodo.org

json

Updated Dec 11, 2024

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Radek Hranický; Radek Hranický; Jan Polišenský; Jan Polišenský; Adam Horák; Petr Pouč; Petr Pouč; Kamil Jeřábek; Kamil Jeřábek; Tomáš Ebert; Adam Horák; Tomáš Ebert (2024). A Dataset of Information (DNS, IP, WHOIS/RDAP, TLS, GeoIP) for a Large Corpus of Benign, Phishing, and Malware Domain Names 2024 [Dataset]. http://doi.org/10.5281/zenodo.14332167

Explore at:

jsonAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.14332167

Dataset updated

Dec 11, 2024

Dataset provided by

Zenodo

Authors

Radek Hranický; Radek Hranický; Jan Polišenský; Jan Polišenský; Adam Horák; Petr Pouč; Petr Pouč; Kamil Jeřábek; Kamil Jeřábek; Tomáš Ebert; Adam Horák; Tomáš Ebert

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Time period covered

Aug 16, 2024

Description

The dataset contains DNS records, IP-related features, WHOIS/RDAP information, information from TLS handshakes and certificates, and GeoIP information for 368,956 benign domains from Cisco Umbrella, 461,338 benign domains from the actual CESNET network traffic, 164,425 phishing domains from PhishTank and OpenPhish services, and 100,809 malware domains from various sources like ThreatFox, The Firebog, MISP threat intelligence platform, and other sources. The ground truth for the phishing dataset was double-check with the VirusTotal (VT) service. Domain names not considered malicious by VT have been removed from phishing and malware datasets. Similarly, benign domain names that were considered risky by VT have been removed from the benign datasets. The data was collected between March 2023 and July 2024. The final assessment of the data was conducted in August 2024.

The dataset is useful for cybersecurity research, e.g. statistical analysis of domain data or feature extraction for training machine learning-based classifiers, e.g. for phishing and malware website detection.

The dataset was created using software available in the associated GitHub repository nesfit/domainradar-dib.

Data Files

The data is located in the following individual files:
- benign_umbrella.json - data for 368,956 benign domains from Cisco Umbrella,
- benign_cesnet.json - data for 461,338 benign domains from the CESNET network,
- phishing.json - data for 164,425 phishing domains, and
- malware.json - data for 100,809 malware domains.
The schema.json file contains a JSON Schema with detailed description of the data entries.

Data Structure

Both files contain a JSON array of records generated using mongoexport (in the MongoDB Extended JSON (v2) format in Relaxed Mode). The following table documents the structure of a record. Please note that:

some fields may be missing (they should be interpreted as nulls),
extra fields may be present (they should be ignored).

Field name	Field type	Nullable	Description
domain_name	String	No	The evaluated domain name
url	String	No	The source URL for the domain name
evaluated_on	Date	No	Date of last collection attempt
source	String	No	An identifier of the source
sourced_on	Date	No	Date of ingestion of the domain name
dns	Object	Yes	Data from DNS scan
rdap	Object	Yes	Data from RDAP or WHOIS
tls	Object	Yes	Data from TLS handshake
ip_data	Array of Objects	Yes	Array of data objects capturing the IP addresses related to the domain name
malware_type	String	No	The malware type/family or “unknown” (only present in malware.json)
DNS data (dns field)
A	Array of Strings	No	Array of IPv4 addresses
AAAA	Array of Strings	No	Array of IPv6 addresses
TXT	Array of Strings	No	Array of raw TXT values
CNAME	Object	No	The CNAME target and related IPs
MX	Array of Objects	No	Array of objects with the MX target hostname, priority and related IPs
NS	Array of Objects	No	Array of objects with the NS target hostname and related IPs
SOA	Object	No	All the SOA fields, present if found at the target domain name
zone_SOA	Object	No	The SOA fields of the target’s zone (closest point of delegation), present if found and not a record in the target domain directly
dnssec	Object	No	Flags describing the DNSSEC validation result for each record type
ttls	Object	No	The TTL values for each record type
remarks	Object	No	The zone domain name and DNSSEC flags
RDAP data (rdap field)
copyright_notice	String	No	RDAP/WHOIS data usage copyright notice
dnssec	Bool	No	DNSSEC presence flag
entitites	Object	No	An object with various arrays representing the found related entity types (e.g. abuse, admin, registrant). The arrays contain objects describing the individual entities.
expiration_date	Date	Yes	The current date of expiration
handle	String	No	RDAP handle
last_changed_date	Date	Yes	The date when the domain was last changed
name	String	No

d
Web Scraping Data | Key Customers Domain Name Data | Scanning Logos found on...
datarade.ai
.json
Updated Jun 27, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
PredictLeads (2024). Web Scraping Data | Key Customers Domain Name Data | Scanning Logos found on Websites | 200M+ Records [Dataset]. https://datarade.ai/data-products/predictleads-web-scraping-data-domain-name-data-business-predictleads
Explore at:
.jsonAvailable download formats
Dataset updated
Jun 27, 2024
Dataset authored and provided by
PredictLeads
Area covered
Burkina Faso, Benin, Oman, Northern Mariana Islands, Turkmenistan, Malaysia, Nigeria, Svalbard and Jan Mayen, Curaçao, Colombia
Description
PredictLeads Key Customers Data offers a critical technical resource for B2B operations, focusing on capturing detailed insights about business relationships directly from company websites. By leveraging advanced web scraping technologies and innovative logo data recognition, we provide extensive Domain Name Data, Logo Data, Company Data, and Business Website Data. This dataset is crucial for executing sophisticated Sentiment Analysis, creating a 360-degree Customer View, enhancing Account Profiling, conducting in-depth Company Analysis, and supporting comprehensive Analytics.

Key Technical Features for B2B Operations:

➡️ Advanced Web Scraping and Logo Data Techniques: PredictLeads employs cutting-edge technologies to detect and analyze key customers represented through logos and mentions on business websites, including case studies and partner pages. ➡️ Rich Domain Name and Company Data: Access detailed information on business relationships and company affiliations that are crucial for analyzing market positions and influence. ➡️ Comprehensive Business Website Data: Utilize data gathered from company websites to gain insights into their operational networks, partnerships, and customer relationships.

Enhancing B2B Strategies with PredictLeads Data:

➡️ 360-Degree Customer Views: Develop comprehensive views of your customers by integrating detailed key customers data, revealing not just direct relationships but also extended networks. ➡️ Account Profiling: Enhance your account profiling efforts by using our connections data to understand the breadth and depth of a company's market engagements and partnerships. ➡️ Sentiment Analysis: Apply sentiment analysis techniques to the data collected from business websites and news sources to assess the sentiment surrounding business relationships and market moves. ➡️ Company Analysis: Leverage our detailed company and business website data to perform in-depth analyses of company strategies, growth potential, and market influence. ➡️ Advanced Analytics: Utilize our comprehensive dataset in your B2B data cleansing processes and analytical models to ensure data accuracy and relevancy in your CRM and marketing automation platforms.

Strategic Technical Applications in B2B:

➡️ Informed Decision-Making: Empower your technical teams with data that highlights strategic key customers and market dynamics, enhancing strategic initiatives and business outcomes. ➡️ Enhanced Data Reliability for Technical Operations: Our rigorous data collection and validation processes ensure you work with the most reliable and relevant data, supporting critical assessments and business operations. ➡️ Competitive and Market Analysis: Utilize our comprehensive data to conduct detailed analyses of competitors and market trends, providing a strategic edge in planning and execution.

Why PredictLeads Key Customers Data is Essential for Technical B2B Teams:

✅ Designed for Technical Precision: Our solutions are meticulously crafted to meet the specific needs of technical teams, offering unparalleled depth and applicability. ✅ Up-to-Date and Comprehensive: Continuous updates and broad coverage ensure that our key customers data captures the dynamic nature of global business environments, providing timely and essential insights. ✅ Trusted by Industry Leaders: Recognized for its robust data architecture and precision, PredictLeads is relied upon by technical analysts and data scientists across industries to guide their strategy and operations.

PredictLeads Key Customers Data is a tool for B2B organizations that rely on deep technical insights to steer their strategic and operational directives. By integrating the key customers data into your systems, you enhance your capacity for informed decision-making, ensuring robust technical operations and strategic advantage in a competitive marketplace.
Leading e-commerce analytics technologies worldwide 2023
statista.com
Updated Jun 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2023). Leading e-commerce analytics technologies worldwide 2023 [Dataset]. https://www.statista.com/statistics/1390127/e-commerce-analytics-technologies/
Explore at:
Dataset updated
Jun 23, 2023
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description
E-commerce companies measure the interactions of online shoppers with products or services throughout the entire shopping experience. As of June 2023, Google's plug-in was the most used e-commerce analytics technology, being active on over 61,000 e-commerce sites worldwide. CM Commerce and AddShoppers followed in the ranking, with 5,396 and 4,818 domains, respectively.
A labeled Ecore metamodel dataset for domain clustering
zenodo.org
explore.openaire.eu
zip
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Önder Babur; Önder Babur (2020). A labeled Ecore metamodel dataset for domain clustering [Dataset]. http://doi.org/10.5281/zenodo.2585432
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.2585432
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Önder Babur; Önder Babur
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Manually labeled 555 metamodels mined from GitHub in April 2017.

Domains: (1) bibliography, (2) conference management, (3) bug/issue tracker, (4) build systems, (5) document/office products, (6) requirement/use case, (7) database/sql, (8) state machines, (9) petri nets

Procedure for constructing the dataset: fully manual, by searching for certain keywords and regexes (e.g. "state" and "transition" for state machines) in the metamodels and inspecting the results for inclusion.

Format for the file names: ABSINDEX_CLUSTER_ITEMINDEX_name_hash.ecore
H
Advancing Open and Reproducible Water Data Science by Integrating Data...
hydroshare.org
beta.hydroshare.org
+1more
zip
Updated Jan 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Advancing Open and Reproducible Water Data Science by Integrating Data Analytics with an Online Data Repository [Dataset]. https://www.hydroshare.org/resource/45d3427e794543cfbee129c604d7e865
Explore at:
zip(50.9 MB)Available download formats
Dataset updated
Jan 9, 2024
Dataset provided by
HydroShare
Authors
Jeffery S. Horsburgh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Scientific and related management challenges in the water domain require synthesis of data from multiple domains. Many data analysis tasks are difficult because datasets are large and complex; standard formats for data types are not always agreed upon nor mapped to an efficient structure for analysis; water scientists may lack training in methods needed to efficiently tackle large and complex datasets; and available tools can make it difficult to share, collaborate around, and reproduce scientific work. Overcoming these barriers to accessing, organizing, and preparing datasets for analyses will be an enabler for transforming scientific inquiries. Building on the HydroShare repository’s established cyberinfrastructure, we have advanced two packages for the Python language that make data loading, organization, and curation for analysis easier, reducing time spent in choosing appropriate data structures and writing code to ingest data. These packages enable automated retrieval of data from HydroShare and the USGS’s National Water Information System (NWIS), loading of data into performant structures keyed to specific scientific data types and that integrate with existing visualization, analysis, and data science capabilities available in Python, and then writing analysis results back to HydroShare for sharing and eventual publication. These capabilities reduce the technical burden for scientists associated with creating a computational environment for executing analyses by installing and maintaining the packages within CUAHSI’s HydroShare-linked JupyterHub server. HydroShare users can leverage these tools to build, share, and publish more reproducible scientific workflows. The HydroShare Python Client and USGS NWIS Data Retrieval packages can be installed within a Python environment on any computer running Microsoft Windows, Apple MacOS, or Linux from the Python Package Index using the PIP utility. They can also be used online via the CUAHSI JupyterHub server (https://jupyterhub.cuahsi.org/) or other Python notebook environments like Google Collaboratory (https://colab.research.google.com/). Source code, documentation, and examples for the software are freely available in GitHub at https://github.com/hydroshare/hsclient/ and https://github.com/USGS-python/dataretrieval.

This presentation was delivered as part of the Hawai'i Data Science Institute's regular seminar series: https://datascience.hawaii.edu/event/data-science-and-analytics-for-water/
r
International Journal of Data Science and Analytics Acceptance Rate -...
researchhelpdesk.org
Updated Feb 15, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Research Help Desk (2022). International Journal of Data Science and Analytics Acceptance Rate - ResearchHelpDesk [Dataset]. https://www.researchhelpdesk.org/journal/acceptance-rate/418/international-journal-of-data-science-and-analytics
Explore at:
Dataset updated
Feb 15, 2022
Dataset authored and provided by
Research Help Desk
Description
International Journal of Data Science and Analytics Acceptance Rate - ResearchHelpDesk - International Journal of Data Science and Analytics - Data Science has been established as an important emergent scientific field and paradigm driving research evolution in such disciplines as statistics, computing science and intelligence science, and practical transformation in such domains as science, engineering, the public sector, business, social science, and lifestyle. The field encompasses the larger areas of artificial intelligence, data analytics, machine learning, pattern recognition, natural language understanding, and big data manipulation. It also tackles related new scientific challenges, ranging from data capture, creation, storage, retrieval, sharing, analysis, optimization, and visualization, to integrative analysis across heterogeneous and interdependent complex resources for better decision-making, collaboration, and, ultimately, value creation. The International Journal of Data Science and Analytics (JDSA) brings together thought leaders, researchers, industry practitioners, and potential users of data science and analytics, to develop the field, discuss new trends and opportunities, exchange ideas and practices, and promote transdisciplinary and cross-domain collaborations.
O
Site Analytics: Referrers (ODP Dashboard)
data.austintexas.gov
application/rdfxml +5
Updated Mar 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Site Analytics: Referrers (ODP Dashboard) [Dataset]. https://data.austintexas.gov/City-Government/Site-Analytics-Referrers-ODP-Dashboard-/s93g-y2ej
Explore at:
application/rdfxml, tsv, csv, json, application/rssxml, xmlAvailable download formats
Dataset updated
Mar 27, 2025
Description
This asset is a filter (derived view of a dataset) based on the system dataset, 'Site Analytics: Referrers' which is automatically generated by the City of Austin Open Data Portal (data.austintexas.gov). A referrer is the previous webpage a user was on when following a link to this domain. This dataset provides referrer information by date, referring domain (which specific domains users were on), and name of the asset the user was sent to. The dataset will reflect new Referrer records within a day of when they occur.

Data provided by: Tyler Technologies Creation date of data source: May 21, 2021
SH2 domain related data by CoDIAC analysis
figshare.com
zip
Updated Jul 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kristen Naegle; Alekhya Kandoor (2024). SH2 domain related data by CoDIAC analysis [Dataset]. http://doi.org/10.6084/m9.figshare.26321968.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.26321968.v1
Dataset updated
Jul 17, 2024
Dataset provided by
figshare
Authors
Kristen Naegle; Alekhya Kandoor
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
CoDIAC-derived data is included in this archive, including the reference human SH2-ome, annotated structures from PDB and AlphaFold, post-translational modifications, contact map features, mutations, etc.
Data from: Subsurface Trend Analysis domains for the northern Gulf of Mexico...
osti.gov
Updated Mar 25, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bauer, Jennifer; Mark-Moser, MacKenzie; Miller, Roy; Rose, Kelly (2020). Subsurface Trend Analysis domains for the northern Gulf of Mexico [Dataset]. https://www.osti.gov/dataexplorer/biblio/dataset/1606228
Explore at:
Dataset updated
Mar 25, 2020
Dataset provided by
United States Department of Energyhttp://energy.gov/
National Energy Technology Laboratoryhttps://netl.doe.gov/
Authors
Bauer, Jennifer; Mark-Moser, MacKenzie; Miller, Roy; Rose, Kelly
Area covered
Gulf of Mexico (Gulf of America)
Description
Geologic domains for the northern Gulf of Mexico derived using the Subsurface Trend Analysis (STA) method. The domains were postulated using geologic province, lithologic, and structural information and validated using statistical methods. Publication detailing the STA method: Rose, K., Bauer, J.R., and Mark-Moser, M. (2020) Subsurface trend analysis, a multi-variate geospatial approach for subsurface evaluation and uncertainty reduction, Interpretation, vol. 8, issue 1 https://library.seg.org/doi/abs/10.1190/int-2019-0019.1 Detailed discussion of domain formation and analysis: Mark-Moser, M., Miller, R., Bauer, J., Rose, K., and C. Disenhof. 2018, Analysis of Subsurface Reservoir Properties Using a Novel Geospatial Approach, Offshore Gulf of Mexico. NETL-TRS-2018 https://edx.netl.doe.gov/dataset/detailed-analysis-of-geospatial-trends-of-hydrocarbon-accumulations-offshore-gulf-of-mexico
UMUDGA - Domain Generation
kaggle.com
zip
Updated Mar 27, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Saurabh Shahane (2021). UMUDGA - Domain Generation [Dataset]. https://www.kaggle.com/saurabhshahane/domain-generation
Explore at:
zip(1346047998 bytes)Available download formats
Dataset updated
Mar 27, 2021
Authors
Saurabh Shahane
Description
Context

In computer security, network botnets still represent a major cyber threat. Concealing techniques such as the dynamic addressing and the Domain Name Generation Algorithms (DGAs) require an improved and more effective detection process. To this extent, this data descriptor presents a collection of over 30 million manually-labelled algorithmically generated domain names decorated with a feature set ready-to-use for Machine Learning analysis. This proposed data set enables researchers to move forward the data collection, organization and pre-processing phases, eventually enabling them to focus on the analysis and the production of Machine-Learning powered solutions for network intrusion detection.

Content

50 among the most important malware variants have been selected. Each family is available both as list of domains and as collection of features. To be more precise, the former is generated by executing the malware DGAs in a controlled environment with fixed parameters, while the latter is generated by extracting a combination of statistical and Natural Language Processing (NLP) metrics.

Acknowledgements

Zago, Mattia; Gil Pérez, Manuel; Martinez Perez, Gregorio (2020), “UMUDGA - University of Murcia Domain Generation Algorithm Dataset”, Mendeley Data, V1, doi: 10.17632/y8ph45msv8.1
d
Spatial Provinces and Domains of the Central Valley for Textural Analysis
catalog.data.gov
data.usgs.gov
+2more
Updated Nov 1, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Spatial Provinces and Domains of the Central Valley for Textural Analysis [Dataset]. https://catalog.data.gov/dataset/spatial-provinces-and-domains-of-the-central-valley-for-textural-analysis
Explore at:
Dataset updated
Nov 1, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
Central Valley
Description
This digital dataset contains the 9 major areas used to subdivide the Central Valley for the interpolation of the percentage of coarse-grained deposits into the texture model. This texture model was used as input data for the hydraulic properties portion of the Central Valley Hydrologic Model (CVHM). The Central Valley encompasses an approximate 50,000 square-kilometer region of California. The complex hydrologic system of the Central Valley is simulated using the USGS numerical modeling code MODFLOW-FMP (Schmid and others, 2006). This simulation is referred to here as the CVHM (Faunt, 2009). Utilizing MODFLOW-FMP, the CVHM simulates groundwater and surface-water flow, irrigated agriculture, land subsidence, and other key processes in the Central Valley on a monthly basis from 1961-2003. The total active modeled area is 20,334 square-miles on a finite difference grid comprising 441 rows and 98 columns. Slightly less than 50 percent of the cells are active. The CVHM model grid has a uniform horizontal discretization of 1x1 square mile and is oriented parallel to the valley axis, 34 degrees west of north (Faunt, 2009). In order to better characterize the aquifer-system deposits, lithologic data from approximately 8,500 drillers' logs of boreholes ranging in depth from 12 to 3,000 feet below land surface were compiled and analyzed. The percentage of coarse-grained sediment, or texture, then was computed for each 50-foot depth interval of the drillers' logs. A 3-dimensional texture model was developed by interpolating the percentage of coarse-grained deposits onto a 1-mile spatial grid at 50-foot-depth intervals from land surface to 2,800 feet below land surface. The CVHM is the most recent regional-scale model of the Central Valley developed by the U.S. Geological Survey (USGS). The CVHM was developed as part of the USGS Groundwater Resources Program (see "Foreword", Chapter A, page iii, for details).
Applications and domains of AI in Poland 2021
statista.com
Updated Apr 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Applications and domains of AI in Poland 2021 [Dataset]. https://www.statista.com/statistics/1228413/poland-apps-and-domains-of-ai/
Explore at:
Dataset updated
Apr 10, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Jan 2021 - Feb 2021
Area covered
Poland
Description
In 2021, AI was most applied to BI and data analytics, computer vision, data exploration, and NLP in Poland.
Κ
The Enhanced Microsoft Academic Knowledge Graph
datacatalogue.sodanet.gr
datacatalogue.cessda.eu
Updated Apr 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Κατάλογος Δεδομένων SoDaNet (2024). The Enhanced Microsoft Academic Knowledge Graph [Dataset]. http://doi.org/10.17903/FK2/TZWQPD
Explore at:
Unique identifier
https://doi.org/10.17903/FK2/TZWQPD
Dataset updated
Apr 30, 2024
Dataset provided by
Κατάλογος Δεδομένων SoDaNet
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 1, 1800 - Dec 31, 2021
Area covered
Worldwide
Dataset funded by
European Commission
Description
The Enhanced Microsoft Academic Knowledge Graph (EMAKG) is a large dataset of scientific publications and related entities, including authors, institutions, journals, conferences, and fields of study. The proposed dataset originates from the Microsoft Academic Knowledge Graph (MAKG), one of the most extensive freely available knowledge graphs of scholarly data. To build the dataset, we first assessed the limitations of the current MAKG. Then, based on these, several methods were designed to enhance data and facilitate the number of use case scenarios, particularly in mobility and network analysis. EMAKG provides two main advantages: It has improved usability, facilitating access to non-expert users It includes an increased number of types of information obtained by integrating various datasets and sources, which help expand the application domains. For instance, geographical information could help mobility and migration research. The knowledge graph completeness is improved by retrieving and merging information on publications and other entities no longer available in the latest version of MAKG. Furthermore, geographical and collaboration networks details are employed to provide data on authors as well as their annual locations and career nationalities, together with worldwide yearly stocks and flows. Among others, the dataset also includes: fields of study (and publications) labelled by their discipline(s); abstracts and linguistic features, i.e., standard language codes, tokens , and types entities’ general information, e.g., date of foundation and type of institutions; and academia related metrics, i.e., h-index. The resulting dataset maintains all the characteristics of the parent datasets and includes a set of additional subsets and data that can be used for new case studies relating to network analysis, knowledge exchange, linguistics, computational linguistics, and mobility and human migration, among others.
O
Site Analytics: Catalog Search Terms (ODP Dashboard)
data.austintexas.gov
application/rdfxml +5
Updated Mar 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Site Analytics: Catalog Search Terms (ODP Dashboard) [Dataset]. https://data.austintexas.gov/City-Government/Site-Analytics-Catalog-Search-Terms-ODP-Dashboard-/8sxf-t34r
Explore at:
json, csv, xml, application/rdfxml, tsv, application/rssxmlAvailable download formats
Dataset updated
Mar 26, 2025
Description
This asset is a filter (derived view of a dataset) based on the system dataset, 'Site Analytics: Catalog Search Terms' which is automatically generated by the City of Austin Open Data Portal (data.austintexas.gov). It provides data on the words and phrases entered by site users of in search bars that look through the data catalog for relevant information. Catalog searches using the Discovery API are not included.

Each row in the dataset indicates the number of catalog searches made using the search term from the specified user segment during the noted hour.

Data are segmented into the following user types: • site member: users who have logged in and have been granted a role on the domain • community user: users who have logged in but do not have a role on the domain • anonymous: users who have not logged in to the domain

Data are updated by a system process at least once a day, if there is new data to record.

Data provided by: Tyler Technologies Creation date of data source: January 31, 2020

Facebook

Twitter

Click to copy link

Link copied

Cite

Leva, Federico (2022). Web requests analysis of Italy websites which use Google Analytics [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6793112

Web requests analysis of Italy websites which use Google Analytics

Explore at:

Dataset updated

Aug 9, 2022

Dataset authored and provided by

Leva, Federico

License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Area covered

Italy

Description

List of 504,038 domains of Italy found to contain Google Analytics.

The front page for Italy-related domain names has been accessed through HTTPS or HTTP and analysed with webbkoll and jq to gather data about third-party requests, cookies and other privacy-invasive features. Together with the actual URL visited, the user/property ID is provided for 495,663 domains (extracted either from the cookies deposited or the URL of requests to Google Analytics). MX and TXT records for the domains are also provided.

The most common ID found was 23LNSPS7Q6, with over 35k domains calling it (seemingly associated with italiaonline.it). The most common responding IP addresses were 3 AWS IPv4 addresses (over 40k domains) and 2 CloudFlare IPv6 addresses (over 12k domains).

Clear search

Close search

Google apps

Main menu

Web requests analysis of Italy websites which use Google Analytics

Share of analytics firms that find domain knowledge important in India 2016

Global Domain Name Data | DNS and Risk Classification via Dataset & API |...

DataForSEO Labs API for keyword research and search analytics, real-time...

Expiring and Deleted Domains Stats

‘Major US Open Data Domains’ analyzed by Analyst-2

A Dataset of Information (DNS, IP, WHOIS/RDAP, TLS, GeoIP) for a Large...

Data Files

Data Structure

Web Scraping Data | Key Customers Domain Name Data | Scanning Logos found on...

Leading e-commerce analytics technologies worldwide 2023

A labeled Ecore metamodel dataset for domain clustering

Advancing Open and Reproducible Water Data Science by Integrating Data...

International Journal of Data Science and Analytics Acceptance Rate -...

Site Analytics: Referrers (ODP Dashboard)

SH2 domain related data by CoDIAC analysis

Data from: Subsurface Trend Analysis domains for the northern Gulf of Mexico...

UMUDGA - Domain Generation

Context

Content

Acknowledgements

Spatial Provinces and Domains of the Central Valley for Textural Analysis

Applications and domains of AI in Poland 2021

The Enhanced Microsoft Academic Knowledge Graph

Site Analytics: Catalog Search Terms (ODP Dashboard)

Web requests analysis of Italy websites which use Google Analytics