100+ datasets found
  1. Z

    Web requests analysis of Italy websites which use Google Analytics

    • data.niaid.nih.gov
    • zenodo.org
    Updated Aug 9, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Leva, Federico (2022). Web requests analysis of Italy websites which use Google Analytics [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6793112
    Explore at:
    Dataset updated
    Aug 9, 2022
    Dataset authored and provided by
    Leva, Federico
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Italy
    Description

    List of 504,038 domains of Italy found to contain Google Analytics.

    The front page for Italy-related domain names has been accessed through HTTPS or HTTP and analysed with webbkoll and jq to gather data about third-party requests, cookies and other privacy-invasive features. Together with the actual URL visited, the user/property ID is provided for 495,663 domains (extracted either from the cookies deposited or the URL of requests to Google Analytics). MX and TXT records for the domains are also provided.

    The most common ID found was 23LNSPS7Q6, with over 35k domains calling it (seemingly associated with italiaonline.it). The most common responding IP addresses were 3 AWS IPv4 addresses (over 40k domains) and 2 CloudFlare IPv6 addresses (over 12k domains).

  2. Share of analytics firms that find domain knowledge important in India 2016

    • statista.com
    Updated Feb 17, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2021). Share of analytics firms that find domain knowledge important in India 2016 [Dataset]. https://www.statista.com/statistics/871511/india-share-of-analytics-firms-rating-domain-knowledge-as-important/
    Explore at:
    Dataset updated
    Feb 17, 2021
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2016
    Area covered
    India
    Description

    This statistic displays the share of data analytics firms rating domain knowledge as critically important across India in 2016, by market position. In that year, 100 percent of leading firms within the data analytics industry rated domain knowledge as being critically important for their business.

  3. d

    Global Domain Name Data | DNS and Risk Classification via Dataset & API |...

    • datarade.ai
    .csv, .json
    Updated Nov 2, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datazag (2024). Global Domain Name Data | DNS and Risk Classification via Dataset & API | 267M+ Domains Covering Over 1570 Domain Zones | Updated Daily [Dataset]. https://datarade.ai/data-products/datazag-global-domain-name-data-dns-and-risk-classificatio-datazag
    Explore at:
    .csv, .jsonAvailable download formats
    Dataset updated
    Nov 2, 2024
    Dataset authored and provided by
    Datazag
    Area covered
    Lesotho, Bahamas, Marshall Islands, Kenya, Dominica, Norway, State of, Niue, Gambia, Paraguay
    Description

    DomainIQ is a comprehensive global Domain Name dataset for organizations that want to build cyber security, data cleaning and email marketing applications. The dataset consists of the DNS records for over 267 million domains, updated daily, representing more than 90% of all public domains in the world.

    The data is enriched by over thirty unique data points, including identifying the mailbox provider for each domain and using AI based predictive analytics to identify elevated risk domains from both a cyber security and email sending reputation perspective.

    DomainIQ from Datazag offers layered intelligence through a highly flexible API and as a dataset, available for both cloud and on-premises applications. Standard formats include CSV, JSON, Parquet, and DuckDB.

    Custom options are available for any other file or database format. With daily updates and constant research from Datazag, organizations can develop their own market leading cyber security, data cleaning and email marketing applications supported by comprehensive and accurate data from Datazag. Data updates available on a daily, weekly and monthly basis. API data is updated on a daily basis.

  4. DataForSEO Labs API for keyword research and search analytics, real-time...

    • datarade.ai
    .json
    Updated Jun 4, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DataForSEO (2021). DataForSEO Labs API for keyword research and search analytics, real-time data for all Google locations and languages [Dataset]. https://datarade.ai/data-products/dataforseo-labs-api-for-keyword-research-and-search-analytics-dataforseo
    Explore at:
    .jsonAvailable download formats
    Dataset updated
    Jun 4, 2021
    Dataset provided by
    Authors
    DataForSEO
    Area covered
    Korea (Democratic People's Republic of), Kenya, Azerbaijan, Isle of Man, Armenia, Mauritania, Micronesia (Federated States of), Tokelau, Cocos (Keeling) Islands, Morocco
    Description

    DataForSEO Labs API offers three powerful keyword research algorithms and historical keyword data:

    • Related Keywords from the “searches related to” element of Google SERP. • Keyword Suggestions that match the specified seed keyword with additional words before, after, or within the seed key phrase. • Keyword Ideas that fall into the same category as specified seed keywords. • Historical Search Volume with current cost-per-click, and competition values.

    Based on in-market categories of Google Ads, you can get keyword ideas from the relevant Categories For Domain and discover relevant Keywords For Categories. You can also obtain Top Google Searches with AdWords and Bing Ads metrics, product categories, and Google SERP data.

    You will find well-rounded ways to scout the competitors:

    • Domain Whois Overview with ranking and traffic info from organic and paid search. • Ranked Keywords that any domain or URL has positions for in SERP. • SERP Competitors and the rankings they hold for the keywords you specify. • Competitors Domain with a full overview of its rankings and traffic from organic and paid search. • Domain Intersection keywords for which both specified domains rank within the same SERPs. • Subdomains for the target domain you specify along with the ranking distribution across organic and paid search. • Relevant Pages of the specified domain with rankings and traffic data. • Domain Rank Overview with ranking and traffic data from organic and paid search. • Historical Rank Overview with historical data on rankings and traffic of the specified domain from organic and paid search. • Page Intersection keywords for which the specified pages rank within the same SERP.

    All DataForSEO Labs API endpoints function in the Live mode. This means you will be provided with the results in response right after sending the necessary parameters with a POST request.

    The limit is 2000 API calls per minute, however, you can contact our support team if your project requires higher rates.

    We offer well-rounded API documentation, GUI for API usage control, comprehensive client libraries for different programming languages, free sandbox API testing, ad hoc integration, and deployment support.

    We have a pay-as-you-go pricing model. You simply add funds to your account and use them to get data. The account balance doesn't expire.

  5. w

    Expiring and Deleted Domains Stats

    • whoisfreaks.com
    Updated Oct 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WhoisFreaks (2024). Expiring and Deleted Domains Stats [Dataset]. https://whoisfreaks.com/products/expiring-dropped-domains
    Explore at:
    Dataset updated
    Oct 10, 2024
    Dataset authored and provided by
    WhoisFreaks
    License

    https://whoisfreaks.com/termshttps://whoisfreaks.com/terms

    Time period covered
    Mar 19, 2025 - Mar 26, 2025
    Area covered
    Lahore, Pakistan
    Description

    The expiring and deleted domains statistics cover both generic top-level domains (gTLDs) and country-code top-level domains (ccTLDs). This dataset helps you stay up to date and make data-driven decisions in the domain industry based on daily updates.

  6. A

    ‘Major US Open Data Domains’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 27, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Major US Open Data Domains’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/data-gov-major-us-open-data-domains-640b/7461f511/?iid=003-707&v=presentation
    Explore at:
    Dataset updated
    Jan 27, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    United States
    Description

    Analysis of ‘Major US Open Data Domains’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/98e060dc-3da0-45e9-bf33-4a37a98ded89 on 27 January 2022.

    --- Dataset description provided by original source is as follows ---

    An incomplete collection of open data domains throughout the U.S. (intended for comparison with King County open data)

    --- Original source retains full ownership of the source dataset ---

  7. z

    A Dataset of Information (DNS, IP, WHOIS/RDAP, TLS, GeoIP) for a Large...

    • zenodo.org
    json
    Updated Dec 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Radek Hranický; Radek Hranický; Jan Polišenský; Jan Polišenský; Adam Horák; Petr Pouč; Petr Pouč; Kamil Jeřábek; Kamil Jeřábek; Tomáš Ebert; Adam Horák; Tomáš Ebert (2024). A Dataset of Information (DNS, IP, WHOIS/RDAP, TLS, GeoIP) for a Large Corpus of Benign, Phishing, and Malware Domain Names 2024 [Dataset]. http://doi.org/10.5281/zenodo.14332167
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Dec 11, 2024
    Dataset provided by
    Zenodo
    Authors
    Radek Hranický; Radek Hranický; Jan Polišenský; Jan Polišenský; Adam Horák; Petr Pouč; Petr Pouč; Kamil Jeřábek; Kamil Jeřábek; Tomáš Ebert; Adam Horák; Tomáš Ebert
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Aug 16, 2024
    Description

    The dataset contains DNS records, IP-related features, WHOIS/RDAP information, information from TLS handshakes and certificates, and GeoIP information for 368,956 benign domains from Cisco Umbrella, 461,338 benign domains from the actual CESNET network traffic, 164,425 phishing domains from PhishTank and OpenPhish services, and 100,809 malware domains from various sources like ThreatFox, The Firebog, MISP threat intelligence platform, and other sources. The ground truth for the phishing dataset was double-check with the VirusTotal (VT) service. Domain names not considered malicious by VT have been removed from phishing and malware datasets. Similarly, benign domain names that were considered risky by VT have been removed from the benign datasets. The data was collected between March 2023 and July 2024. The final assessment of the data was conducted in August 2024.

    The dataset is useful for cybersecurity research, e.g. statistical analysis of domain data or feature extraction for training machine learning-based classifiers, e.g. for phishing and malware website detection.

    The dataset was created using software available in the associated GitHub repository nesfit/domainradar-dib.

    Data Files

    • The data is located in the following individual files:

      • benign_umbrella.json - data for 368,956 benign domains from Cisco Umbrella,
      • benign_cesnet.json - data for 461,338 benign domains from the CESNET network,
      • phishing.json - data for 164,425 phishing domains, and
      • malware.json - data for 100,809 malware domains.
    • The schema.json file contains a JSON Schema with detailed description of the data entries.

    Data Structure

    Both files contain a JSON array of records generated using mongoexport (in the MongoDB Extended JSON (v2) format in Relaxed Mode). The following table documents the structure of a record. Please note that:

    • some fields may be missing (they should be interpreted as nulls),
    • extra fields may be present (they should be ignored).

    Field name

    Field type

    Nullable

    Description

    domain_name

    String

    No

    The evaluated domain name

    url

    String

    No

    The source URL for the domain name

    evaluated_on

    Date

    No

    Date of last collection attempt

    source

    String

    No

    An identifier of the source

    sourced_on

    Date

    No

    Date of ingestion of the domain name

    dns

    Object

    Yes

    Data from DNS scan

    rdap

    Object

    Yes

    Data from RDAP or WHOIS

    tls

    Object

    Yes

    Data from TLS handshake

    ip_data

    Array of Objects

    Yes

    Array of data objects capturing the IP addresses related to the domain name

    malware_type

    String

    No

    The malware type/family or “unknown” (only present in malware.json)

    DNS data (dns field)

    A

    Array of Strings

    No

    Array of IPv4 addresses

    AAAA

    Array of Strings

    No

    Array of IPv6 addresses

    TXT

    Array of Strings

    No

    Array of raw TXT values

    CNAME

    Object

    No

    The CNAME target and related IPs

    MX

    Array of Objects

    No

    Array of objects with the MX target hostname, priority and related IPs

    NS

    Array of Objects

    No

    Array of objects with the NS target hostname and related IPs

    SOA

    Object

    No

    All the SOA fields, present if found at the target domain name

    zone_SOA

    Object

    No

    The SOA fields of the target’s zone (closest point of delegation), present if found and not a record in the target domain directly

    dnssec

    Object

    No

    Flags describing the DNSSEC validation result for each record type

    ttls

    Object

    No

    The TTL values for each record type

    remarks

    Object

    No

    The zone domain name and DNSSEC flags

    RDAP data (rdap field)

    copyright_notice

    String

    No

    RDAP/WHOIS data usage copyright notice

    dnssec

    Bool

    No

    DNSSEC presence flag

    entitites

    Object

    No

    An object with various arrays representing the found related entity types (e.g. abuse, admin, registrant). The arrays contain objects describing the individual entities.

    expiration_date

    Date

    Yes

    The current date of expiration

    handle

    String

    No

    RDAP handle

    last_changed_date

    Date

    Yes

    The date when the domain was last changed

    name

    String

    No

  8. d

    Web Scraping Data | Key Customers Domain Name Data | Scanning Logos found on...

    • datarade.ai
    .json
    Updated Jun 27, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PredictLeads (2024). Web Scraping Data | Key Customers Domain Name Data | Scanning Logos found on Websites | 200M+ Records [Dataset]. https://datarade.ai/data-products/predictleads-web-scraping-data-domain-name-data-business-predictleads
    Explore at:
    .jsonAvailable download formats
    Dataset updated
    Jun 27, 2024
    Dataset authored and provided by
    PredictLeads
    Area covered
    Burkina Faso, Benin, Oman, Northern Mariana Islands, Turkmenistan, Malaysia, Nigeria, Svalbard and Jan Mayen, Curaçao, Colombia
    Description

    PredictLeads Key Customers Data offers a critical technical resource for B2B operations, focusing on capturing detailed insights about business relationships directly from company websites. By leveraging advanced web scraping technologies and innovative logo data recognition, we provide extensive Domain Name Data, Logo Data, Company Data, and Business Website Data. This dataset is crucial for executing sophisticated Sentiment Analysis, creating a 360-degree Customer View, enhancing Account Profiling, conducting in-depth Company Analysis, and supporting comprehensive Analytics.

    Key Technical Features for B2B Operations:

    ➡️ Advanced Web Scraping and Logo Data Techniques: PredictLeads employs cutting-edge technologies to detect and analyze key customers represented through logos and mentions on business websites, including case studies and partner pages. ➡️ Rich Domain Name and Company Data: Access detailed information on business relationships and company affiliations that are crucial for analyzing market positions and influence. ➡️ Comprehensive Business Website Data: Utilize data gathered from company websites to gain insights into their operational networks, partnerships, and customer relationships.

    Enhancing B2B Strategies with PredictLeads Data:

    ➡️ 360-Degree Customer Views: Develop comprehensive views of your customers by integrating detailed key customers data, revealing not just direct relationships but also extended networks. ➡️ Account Profiling: Enhance your account profiling efforts by using our connections data to understand the breadth and depth of a company's market engagements and partnerships. ➡️ Sentiment Analysis: Apply sentiment analysis techniques to the data collected from business websites and news sources to assess the sentiment surrounding business relationships and market moves. ➡️ Company Analysis: Leverage our detailed company and business website data to perform in-depth analyses of company strategies, growth potential, and market influence. ➡️ Advanced Analytics: Utilize our comprehensive dataset in your B2B data cleansing processes and analytical models to ensure data accuracy and relevancy in your CRM and marketing automation platforms.

    Strategic Technical Applications in B2B:

    ➡️ Informed Decision-Making: Empower your technical teams with data that highlights strategic key customers and market dynamics, enhancing strategic initiatives and business outcomes. ➡️ Enhanced Data Reliability for Technical Operations: Our rigorous data collection and validation processes ensure you work with the most reliable and relevant data, supporting critical assessments and business operations. ➡️ Competitive and Market Analysis: Utilize our comprehensive data to conduct detailed analyses of competitors and market trends, providing a strategic edge in planning and execution.

    Why PredictLeads Key Customers Data is Essential for Technical B2B Teams:

    ✅ Designed for Technical Precision: Our solutions are meticulously crafted to meet the specific needs of technical teams, offering unparalleled depth and applicability. ✅ Up-to-Date and Comprehensive: Continuous updates and broad coverage ensure that our key customers data captures the dynamic nature of global business environments, providing timely and essential insights. ✅ Trusted by Industry Leaders: Recognized for its robust data architecture and precision, PredictLeads is relied upon by technical analysts and data scientists across industries to guide their strategy and operations.

    PredictLeads Key Customers Data is a tool for B2B organizations that rely on deep technical insights to steer their strategic and operational directives. By integrating the key customers data into your systems, you enhance your capacity for informed decision-making, ensuring robust technical operations and strategic advantage in a competitive marketplace.

  9. Leading e-commerce analytics technologies worldwide 2023

    • statista.com
    Updated Jun 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2023). Leading e-commerce analytics technologies worldwide 2023 [Dataset]. https://www.statista.com/statistics/1390127/e-commerce-analytics-technologies/
    Explore at:
    Dataset updated
    Jun 23, 2023
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    E-commerce companies measure the interactions of online shoppers with products or services throughout the entire shopping experience. As of June 2023, Google's plug-in was the most used e-commerce analytics technology, being active on over 61,000 e-commerce sites worldwide. CM Commerce and AddShoppers followed in the ranking, with 5,396 and 4,818 domains, respectively.

  10. A labeled Ecore metamodel dataset for domain clustering

    • zenodo.org
    • explore.openaire.eu
    zip
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Önder Babur; Önder Babur (2020). A labeled Ecore metamodel dataset for domain clustering [Dataset]. http://doi.org/10.5281/zenodo.2585432
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Önder Babur; Önder Babur
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Manually labeled 555 metamodels mined from GitHub in April 2017.

    Domains: (1) bibliography, (2) conference management, (3) bug/issue tracker, (4) build systems, (5) document/office products, (6) requirement/use case, (7) database/sql, (8) state machines, (9) petri nets

    Procedure for constructing the dataset: fully manual, by searching for certain keywords and regexes (e.g. "state" and "transition" for state machines) in the metamodels and inspecting the results for inclusion.

    Format for the file names: ABSINDEX_CLUSTER_ITEMINDEX_name_hash.ecore

  11. H

    Advancing Open and Reproducible Water Data Science by Integrating Data...

    • hydroshare.org
    • beta.hydroshare.org
    • +1more
    zip
    Updated Jan 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Advancing Open and Reproducible Water Data Science by Integrating Data Analytics with an Online Data Repository [Dataset]. https://www.hydroshare.org/resource/45d3427e794543cfbee129c604d7e865
    Explore at:
    zip(50.9 MB)Available download formats
    Dataset updated
    Jan 9, 2024
    Dataset provided by
    HydroShare
    Authors
    Jeffery S. Horsburgh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Scientific and related management challenges in the water domain require synthesis of data from multiple domains. Many data analysis tasks are difficult because datasets are large and complex; standard formats for data types are not always agreed upon nor mapped to an efficient structure for analysis; water scientists may lack training in methods needed to efficiently tackle large and complex datasets; and available tools can make it difficult to share, collaborate around, and reproduce scientific work. Overcoming these barriers to accessing, organizing, and preparing datasets for analyses will be an enabler for transforming scientific inquiries. Building on the HydroShare repository’s established cyberinfrastructure, we have advanced two packages for the Python language that make data loading, organization, and curation for analysis easier, reducing time spent in choosing appropriate data structures and writing code to ingest data. These packages enable automated retrieval of data from HydroShare and the USGS’s National Water Information System (NWIS), loading of data into performant structures keyed to specific scientific data types and that integrate with existing visualization, analysis, and data science capabilities available in Python, and then writing analysis results back to HydroShare for sharing and eventual publication. These capabilities reduce the technical burden for scientists associated with creating a computational environment for executing analyses by installing and maintaining the packages within CUAHSI’s HydroShare-linked JupyterHub server. HydroShare users can leverage these tools to build, share, and publish more reproducible scientific workflows. The HydroShare Python Client and USGS NWIS Data Retrieval packages can be installed within a Python environment on any computer running Microsoft Windows, Apple MacOS, or Linux from the Python Package Index using the PIP utility. They can also be used online via the CUAHSI JupyterHub server (https://jupyterhub.cuahsi.org/) or other Python notebook environments like Google Collaboratory (https://colab.research.google.com/). Source code, documentation, and examples for the software are freely available in GitHub at https://github.com/hydroshare/hsclient/ and https://github.com/USGS-python/dataretrieval.

    This presentation was delivered as part of the Hawai'i Data Science Institute's regular seminar series: https://datascience.hawaii.edu/event/data-science-and-analytics-for-water/

  12. r

    International Journal of Data Science and Analytics Acceptance Rate -...

    • researchhelpdesk.org
    Updated Feb 15, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Research Help Desk (2022). International Journal of Data Science and Analytics Acceptance Rate - ResearchHelpDesk [Dataset]. https://www.researchhelpdesk.org/journal/acceptance-rate/418/international-journal-of-data-science-and-analytics
    Explore at:
    Dataset updated
    Feb 15, 2022
    Dataset authored and provided by
    Research Help Desk
    Description

    International Journal of Data Science and Analytics Acceptance Rate - ResearchHelpDesk - International Journal of Data Science and Analytics - Data Science has been established as an important emergent scientific field and paradigm driving research evolution in such disciplines as statistics, computing science and intelligence science, and practical transformation in such domains as science, engineering, the public sector, business, social science, and lifestyle. The field encompasses the larger areas of artificial intelligence, data analytics, machine learning, pattern recognition, natural language understanding, and big data manipulation. It also tackles related new scientific challenges, ranging from data capture, creation, storage, retrieval, sharing, analysis, optimization, and visualization, to integrative analysis across heterogeneous and interdependent complex resources for better decision-making, collaboration, and, ultimately, value creation. The International Journal of Data Science and Analytics (JDSA) brings together thought leaders, researchers, industry practitioners, and potential users of data science and analytics, to develop the field, discuss new trends and opportunities, exchange ideas and practices, and promote transdisciplinary and cross-domain collaborations.

  13. O

    Site Analytics: Referrers (ODP Dashboard)

    • data.austintexas.gov
    application/rdfxml +5
    Updated Mar 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Site Analytics: Referrers (ODP Dashboard) [Dataset]. https://data.austintexas.gov/City-Government/Site-Analytics-Referrers-ODP-Dashboard-/s93g-y2ej
    Explore at:
    application/rdfxml, tsv, csv, json, application/rssxml, xmlAvailable download formats
    Dataset updated
    Mar 27, 2025
    Description

    This asset is a filter (derived view of a dataset) based on the system dataset, 'Site Analytics: Referrers' which is automatically generated by the City of Austin Open Data Portal (data.austintexas.gov). A referrer is the previous webpage a user was on when following a link to this domain. This dataset provides referrer information by date, referring domain (which specific domains users were on), and name of the asset the user was sent to. The dataset will reflect new Referrer records within a day of when they occur.

    Data provided by: Tyler Technologies Creation date of data source: May 21, 2021

  14. SH2 domain related data by CoDIAC analysis

    • figshare.com
    zip
    Updated Jul 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kristen Naegle; Alekhya Kandoor (2024). SH2 domain related data by CoDIAC analysis [Dataset]. http://doi.org/10.6084/m9.figshare.26321968.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 17, 2024
    Dataset provided by
    figshare
    Authors
    Kristen Naegle; Alekhya Kandoor
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    CoDIAC-derived data is included in this archive, including the reference human SH2-ome, annotated structures from PDB and AlphaFold, post-translational modifications, contact map features, mutations, etc.

  15. Data from: Subsurface Trend Analysis domains for the northern Gulf of Mexico...

    • osti.gov
    Updated Mar 25, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bauer, Jennifer; Mark-Moser, MacKenzie; Miller, Roy; Rose, Kelly (2020). Subsurface Trend Analysis domains for the northern Gulf of Mexico [Dataset]. https://www.osti.gov/dataexplorer/biblio/dataset/1606228
    Explore at:
    Dataset updated
    Mar 25, 2020
    Dataset provided by
    United States Department of Energyhttp://energy.gov/
    National Energy Technology Laboratoryhttps://netl.doe.gov/
    Authors
    Bauer, Jennifer; Mark-Moser, MacKenzie; Miller, Roy; Rose, Kelly
    Area covered
    Gulf of Mexico (Gulf of America)
    Description

    Geologic domains for the northern Gulf of Mexico derived using the Subsurface Trend Analysis (STA) method. The domains were postulated using geologic province, lithologic, and structural information and validated using statistical methods. Publication detailing the STA method: Rose, K., Bauer, J.R., and Mark-Moser, M. (2020) Subsurface trend analysis, a multi-variate geospatial approach for subsurface evaluation and uncertainty reduction, Interpretation, vol. 8, issue 1 https://library.seg.org/doi/abs/10.1190/int-2019-0019.1 Detailed discussion of domain formation and analysis: Mark-Moser, M., Miller, R., Bauer, J., Rose, K., and C. Disenhof. 2018, Analysis of Subsurface Reservoir Properties Using a Novel Geospatial Approach, Offshore Gulf of Mexico. NETL-TRS-2018 https://edx.netl.doe.gov/dataset/detailed-analysis-of-geospatial-trends-of-hydrocarbon-accumulations-offshore-gulf-of-mexico

  16. UMUDGA - Domain Generation

    • kaggle.com
    zip
    Updated Mar 27, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saurabh Shahane (2021). UMUDGA - Domain Generation [Dataset]. https://www.kaggle.com/saurabhshahane/domain-generation
    Explore at:
    zip(1346047998 bytes)Available download formats
    Dataset updated
    Mar 27, 2021
    Authors
    Saurabh Shahane
    Description

    Context

    In computer security, network botnets still represent a major cyber threat. Concealing techniques such as the dynamic addressing and the Domain Name Generation Algorithms (DGAs) require an improved and more effective detection process. To this extent, this data descriptor presents a collection of over 30 million manually-labelled algorithmically generated domain names decorated with a feature set ready-to-use for Machine Learning analysis. This proposed data set enables researchers to move forward the data collection, organization and pre-processing phases, eventually enabling them to focus on the analysis and the production of Machine-Learning powered solutions for network intrusion detection.

    Content

    50 among the most important malware variants have been selected. Each family is available both as list of domains and as collection of features. To be more precise, the former is generated by executing the malware DGAs in a controlled environment with fixed parameters, while the latter is generated by extracting a combination of statistical and Natural Language Processing (NLP) metrics.

    Acknowledgements

    Zago, Mattia; Gil Pérez, Manuel; Martinez Perez, Gregorio (2020), “UMUDGA - University of Murcia Domain Generation Algorithm Dataset”, Mendeley Data, V1, doi: 10.17632/y8ph45msv8.1

  17. d

    Spatial Provinces and Domains of the Central Valley for Textural Analysis

    • catalog.data.gov
    • data.usgs.gov
    • +2more
    Updated Nov 1, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). Spatial Provinces and Domains of the Central Valley for Textural Analysis [Dataset]. https://catalog.data.gov/dataset/spatial-provinces-and-domains-of-the-central-valley-for-textural-analysis
    Explore at:
    Dataset updated
    Nov 1, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Central Valley
    Description

    This digital dataset contains the 9 major areas used to subdivide the Central Valley for the interpolation of the percentage of coarse-grained deposits into the texture model. This texture model was used as input data for the hydraulic properties portion of the Central Valley Hydrologic Model (CVHM). The Central Valley encompasses an approximate 50,000 square-kilometer region of California. The complex hydrologic system of the Central Valley is simulated using the USGS numerical modeling code MODFLOW-FMP (Schmid and others, 2006). This simulation is referred to here as the CVHM (Faunt, 2009). Utilizing MODFLOW-FMP, the CVHM simulates groundwater and surface-water flow, irrigated agriculture, land subsidence, and other key processes in the Central Valley on a monthly basis from 1961-2003. The total active modeled area is 20,334 square-miles on a finite difference grid comprising 441 rows and 98 columns. Slightly less than 50 percent of the cells are active. The CVHM model grid has a uniform horizontal discretization of 1x1 square mile and is oriented parallel to the valley axis, 34 degrees west of north (Faunt, 2009). In order to better characterize the aquifer-system deposits, lithologic data from approximately 8,500 drillers' logs of boreholes ranging in depth from 12 to 3,000 feet below land surface were compiled and analyzed. The percentage of coarse-grained sediment, or texture, then was computed for each 50-foot depth interval of the drillers' logs. A 3-dimensional texture model was developed by interpolating the percentage of coarse-grained deposits onto a 1-mile spatial grid at 50-foot-depth intervals from land surface to 2,800 feet below land surface. The CVHM is the most recent regional-scale model of the Central Valley developed by the U.S. Geological Survey (USGS). The CVHM was developed as part of the USGS Groundwater Resources Program (see "Foreword", Chapter A, page iii, for details).

  18. Applications and domains of AI in Poland 2021

    • statista.com
    Updated Apr 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Applications and domains of AI in Poland 2021 [Dataset]. https://www.statista.com/statistics/1228413/poland-apps-and-domains-of-ai/
    Explore at:
    Dataset updated
    Apr 10, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jan 2021 - Feb 2021
    Area covered
    Poland
    Description

    In 2021, AI was most applied to BI and data analytics, computer vision, data exploration, and NLP in Poland.

  19. Κ

    The Enhanced Microsoft Academic Knowledge Graph

    • datacatalogue.sodanet.gr
    • datacatalogue.cessda.eu
    Updated Apr 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Κατάλογος Δεδομένων SoDaNet (2024). The Enhanced Microsoft Academic Knowledge Graph [Dataset]. http://doi.org/10.17903/FK2/TZWQPD
    Explore at:
    Dataset updated
    Apr 30, 2024
    Dataset provided by
    Κατάλογος Δεδομένων SoDaNet
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 1800 - Dec 31, 2021
    Area covered
    Worldwide
    Dataset funded by
    European Commission
    Description

    The Enhanced Microsoft Academic Knowledge Graph (EMAKG) is a large dataset of scientific publications and related entities, including authors, institutions, journals, conferences, and fields of study. The proposed dataset originates from the Microsoft Academic Knowledge Graph (MAKG), one of the most extensive freely available knowledge graphs of scholarly data. To build the dataset, we first assessed the limitations of the current MAKG. Then, based on these, several methods were designed to enhance data and facilitate the number of use case scenarios, particularly in mobility and network analysis. EMAKG provides two main advantages: It has improved usability, facilitating access to non-expert users It includes an increased number of types of information obtained by integrating various datasets and sources, which help expand the application domains. For instance, geographical information could help mobility and migration research. The knowledge graph completeness is improved by retrieving and merging information on publications and other entities no longer available in the latest version of MAKG. Furthermore, geographical and collaboration networks details are employed to provide data on authors as well as their annual locations and career nationalities, together with worldwide yearly stocks and flows. Among others, the dataset also includes: fields of study (and publications) labelled by their discipline(s); abstracts and linguistic features, i.e., standard language codes, tokens , and types entities’ general information, e.g., date of foundation and type of institutions; and academia related metrics, i.e., h-index. The resulting dataset maintains all the characteristics of the parent datasets and includes a set of additional subsets and data that can be used for new case studies relating to network analysis, knowledge exchange, linguistics, computational linguistics, and mobility and human migration, among others.

  20. O

    Site Analytics: Catalog Search Terms (ODP Dashboard)

    • data.austintexas.gov
    application/rdfxml +5
    Updated Mar 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Site Analytics: Catalog Search Terms (ODP Dashboard) [Dataset]. https://data.austintexas.gov/City-Government/Site-Analytics-Catalog-Search-Terms-ODP-Dashboard-/8sxf-t34r
    Explore at:
    json, csv, xml, application/rdfxml, tsv, application/rssxmlAvailable download formats
    Dataset updated
    Mar 26, 2025
    Description

    This asset is a filter (derived view of a dataset) based on the system dataset, 'Site Analytics: Catalog Search Terms' which is automatically generated by the City of Austin Open Data Portal (data.austintexas.gov). It provides data on the words and phrases entered by site users of in search bars that look through the data catalog for relevant information. Catalog searches using the Discovery API are not included.

    Each row in the dataset indicates the number of catalog searches made using the search term from the specified user segment during the noted hour.

    Data are segmented into the following user types: • site member: users who have logged in and have been granted a role on the domain • community user: users who have logged in but do not have a role on the domain • anonymous: users who have not logged in to the domain

    Data are updated by a system process at least once a day, if there is new data to record.

    Data provided by: Tyler Technologies Creation date of data source: January 31, 2020

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Leva, Federico (2022). Web requests analysis of Italy websites which use Google Analytics [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6793112

Web requests analysis of Italy websites which use Google Analytics

Explore at:
Dataset updated
Aug 9, 2022
Dataset authored and provided by
Leva, Federico
License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Area covered
Italy
Description

List of 504,038 domains of Italy found to contain Google Analytics.

The front page for Italy-related domain names has been accessed through HTTPS or HTTP and analysed with webbkoll and jq to gather data about third-party requests, cookies and other privacy-invasive features. Together with the actual URL visited, the user/property ID is provided for 495,663 domains (extracted either from the cookies deposited or the URL of requests to Google Analytics). MX and TXT records for the domains are also provided.

The most common ID found was 23LNSPS7Q6, with over 35k domains calling it (seemingly associated with italiaonline.it). The most common responding IP addresses were 3 AWS IPv4 addresses (over 40k domains) and 2 CloudFlare IPv6 addresses (over 12k domains).

Search
Clear search
Close search
Google apps
Main menu