4 datasets found
  1. r

    Usage Statistics for University of Tasmania EPrints Repository

    • researchdata.edu.au
    Updated Apr 27, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sale, Arthur; Sale, Arthur (2017). Usage Statistics for University of Tasmania EPrints Repository [Dataset]. https://researchdata.edu.au/usage-statistics-university-eprints-repository/927350
    Explore at:
    Dataset updated
    Apr 27, 2017
    Dataset provided by
    University of Tasmania, Australia
    Authors
    Sale, Arthur; Sale, Arthur
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Description

    The dataset is an active collection of access data to information items in the University of Tasmania’s EPrints repository. Each night a task is scheduled to run, and this picks up in the Apache access logs from where it left off the previous night. Each download of an open access full-text item causes the generation of a database record in the MySQL database, together with a timestamp, and an approximate location of the computer system generating the download. This is achieved by looking up the IP address against the GeoIP database, with one significant difference. Downloads originating from a University of Tasmania IP address are separately identified, and removed from the ‘Australia’ category. This eliminates vanity searches from achieving high significance. Countries are coded using the ISO3166 two-letter code.

    The dataset has been used to analyse the usage made of the repository and to tune it to achieve maximal visibility for the University of Tasmania. Researchers with items in the repository have used it to identify the types of use being made of their work, and to find potential collaborators. The citation of a work in a journal or conference article, for example, causes a typical step in usage, and the citing article can be searched in Google or Google Scholar to identify the authors. This enhances the dissemination experience and its value.

    The software was written in the University of Tasmania by Professor Arthur Sale (in php) based on earlier work by the University of Melbourne (with permission). Mr Christian McGee wrote some critical sections of the code in perl, and set up the cron scheduling.

    The dataset is generated by a computer program written by Professor Arthur Sale. The software was a test bed for ideas, and subsequently resulted in an official software set included in the EPrints distribution. This set expanded on the concepts significantly

  2. d

    815 Million Global Contact Data - B2B / Email / Mobile Phone / LinkedIn URL...

    • datarade.ai
    .json, .csv
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    RampedUp Global Data Solutions, 815 Million Global Contact Data - B2B / Email / Mobile Phone / LinkedIn URL - RampedUp [Dataset]. https://datarade.ai/data-products/global-contact-data-personal-and-professional-840-million-rampedup-global-data-solutions
    Explore at:
    .json, .csvAvailable download formats
    Dataset authored and provided by
    RampedUp Global Data Solutions
    Area covered
    Greece, Ireland, Haiti, Pakistan, Sint Eustatius and Saba, Bolivia (Plurinational State of), Chad, Grenada, Uganda, United States Minor Outlying Islands
    Description

    Sign Up for a free trial: https://rampedup.io/sign-up-%2F-log-in - 7 Days and 50 Credits to test our quality and accuracy.

    These are the fields available within the RampedUp Global dataset.

    CONTACT DATA: Personal Email Address - We manage over 115 million personal email addresses Professional Email - We manage over 200 million professional email addresses Home Address - We manage over 20 million home addresses Mobile Phones - 65 million direct lines to decision makers Social Profiles - Individual Facebook, Twitter, and LinkedIn Local Address - We manage 65M locations for local office mailers, event-based marketing or face-to-face sales calls.

    JOB DATA: Job Title - Standardized titles for ease of use and selection Company Name - The Contact's current employer Job Function - The Company Department associated with the job role Title Level - The Level in the Company associated with the job role Job Start Date - Identify people new to their role as a potential buyer

    EMPLOYER DATA: Websites - Company Website, Root Domain, or Full Domain Addresses - Standardized Address, City, Region, Postal Code, and Country Phone - E164 phone with country code Social Profiles - LinkedIn, CrunchBase, Facebook, and Twitter

    FIRMOGRAPHIC DATA: Industry - 420 classifications for categorizing the company’s main field of business Sector - 20 classifications for categorizing company industries 4 Digit SIC Code - 239 classifications and their definitions 6 Digit NAICS - 452 classifications and their definitions Revenue - Estimated revenue and bands from 1M to over 1B Employee Size - Exact employee count and bands Email Open Scores - Aggregated data at the domain level showing relationships between email opens and corporate domains. IP Address -Company level IP Addresses associated to Domains from a DNS lookup

    CONSUMER DATA: Education - Alma Mater, Degree, Graduation Date Skills - Accumulated Skills associated with work experience
    Interests - Known interests of contact Connections - Number of social connections. Followers - Number of social followers

    Download our data dictionary: https://rampedup.io/our-data

  3. r

    Usage Statistics for University of Tasmania EPrints Repository

    • researchdata.edu.au
    Updated 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arthur Sale; A H J Sale (2019). Usage Statistics for University of Tasmania EPrints Repository [Dataset]. https://researchdata.edu.au/usage-statistics-university-eprints-repository/1668483
    Explore at:
    Dataset updated
    2019
    Dataset provided by
    University of Tasmania, Australia
    Authors
    Arthur Sale; A H J Sale
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset is an active collection of access data to information items in the University of Tasmania's EPrints repository. Each night a task is scheduled to run, and this picks up in the Apache access logs from where it left off the previous night. Each download of an open access full-text item causes the generation of a database record in the MySQL database, together with a timestamp, and an approximate location of the computer system generating the download. This is achieved by looking up the IP address against the GeoIP database, with one significant difference. Downloads originating from a University of Tasmania IP address are separately identified, and removed from the Australia category. This eliminates vanity searches from achieving high significance. Countries are coded using the ISO3166 two-letter code.

    The dataset has been used to analyse the usage made of the repository and to tune it to achieve maximal visibility for the University of Tasmania. Researchers with items in the repository have used it to identify the types of use being made of their work, and to find potential collaborators. The citation of a work in a journal or conference article, for example, causes a typical step in usage, and the citing article can be searched in Google or Google Scholar to identify the authors. This enhances the dissemination experience and its value.

    The software was written in the University of Tasmania by Professor Arthur Sale (in php) based on earlier work by the University of Melbourne (with permission). Mr Christian McGee wrote some critical sections of the code in perl, and set up the cron scheduling.

    The dataset is generated by a computer program written by Professor Arthur Sale. The software was a test bed for ideas, and subsequently resulted in an official software set included in the EPrints distribution. This set expanded on the concepts significantly

  4. CESNET-TimeSeries24: Time Series Dataset for Network Traffic Anomaly...

    • zenodo.org
    • data.niaid.nih.gov
    application/gzip, csv
    Updated Feb 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Josef Koumar; Josef Koumar; Karel Hynek; Karel Hynek; Tomáš Čejka; Tomáš Čejka; Pavel Šiška; Pavel Šiška (2025). CESNET-TimeSeries24: Time Series Dataset for Network Traffic Anomaly Detection and Forecasting [Dataset]. http://doi.org/10.5281/zenodo.13382427
    Explore at:
    csv, application/gzipAvailable download formats
    Dataset updated
    Feb 26, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Josef Koumar; Josef Koumar; Karel Hynek; Karel Hynek; Tomáš Čejka; Tomáš Čejka; Pavel Šiška; Pavel Šiška
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    CESNET-TimeSeries24: The dataset for network traffic forecasting and anomaly detection

    The dataset called CESNET-TimeSeries24 was collected by long-term monitoring of selected statistical metrics for 40 weeks for each IP address on the ISP network CESNET3 (Czech Education and Science Network). The dataset encompasses network traffic from more than 275,000 active IP addresses, assigned to a wide variety of devices, including office computers, NATs, servers, WiFi routers, honeypots, and video-game consoles found in dormitories. Moreover, the dataset is also rich in network anomaly types since it contains all types of anomalies, ensuring a comprehensive evaluation of anomaly detection methods.

    Last but not least, the CESNET-TimeSeries24 dataset provides traffic time series on institutional and IP subnet levels to cover all possible anomaly detection or forecasting scopes. Overall, the time series dataset was created from the 66 billion IP flows that contain 4 trillion packets that carry approximately 3.7 petabytes of data. The CESNET-TimeSeries24 dataset is a complex real-world dataset that will finally bring insights into the evaluation of forecasting models in real-world environments.

    Please cite the usage of our dataset as:

    Koumar, J., Hynek, K., Čejka, T. et al. CESNET-TimeSeries24: Time Series Dataset for Network Traffic Anomaly Detection and Forecasting. Sci Data 12, 338 (2025). https://doi.org/10.1038/s41597-025-04603-x

    @Article{cesnettimeseries24,
    author={Koumar, Josef and Hynek, Karel and {\v{C}}ejka, Tom{\'a}{\v{s}} and {\v{S}}i{\v{s}}ka, Pavel},
    title={CESNET-TimeSeries24: Time Series Dataset for Network Traffic Anomaly Detection and Forecasting},
    journal={Scientific Data},
    year={2025},
    month={Feb},
    day={26},
    volume={12},
    number={1},
    pages={338},
    issn={2052-4463},
    doi={10.1038/s41597-025-04603-x},
    url={https://doi.org/10.1038/s41597-025-04603-x}
    }

    Time series

    We create evenly spaced time series for each IP address by aggregating IP flow records into time series datapoints. The created datapoints represent the behavior of IP addresses within a defined time window of 10 minutes. The vector of time-series metrics v_{ip, i} describes the IP address ip in the i-th time window. Thus, IP flows for vector v_{ip, i} are captured in time windows starting at t_i and ending at t_{i+1}. The time series are built from these datapoints.

    Datapoints created by the aggregation of IP flows contain the following time-series metrics:

    • Simple volumetric metrics: the number of IP flows, the number of packets, and the transmitted data size (i.e. number of bytes)
    • Unique volumetric metrics: the number of unique destination IP addresses, the number of unique destination Autonomous System Numbers (ASNs), and the number of unique destination transport layer ports. The aggregation of \textit{Unique volumetric metrics} is memory intensive since all unique values must be stored in an array. We used a server with 41 GB of RAM, which was enough for 10-minute aggregation on the ISP network.
    • Ratios metrics: the ratio of UDP/TCP packets, the ratio of UDP/TCP transmitted data size, the direction ratio of packets, and the direction ratio of transmitted data size
    • Average metrics: the average flow duration, and the average Time To Live (TTL)

    Multiple time aggregation: The original datapoints in the dataset are aggregated by 10 minutes of network traffic. The size of the aggregation interval influences anomaly detection procedures, mainly the training speed of the detection model. However, the 10-minute intervals can be too short for longitudinal anomaly detection methods. Therefore, we added two more aggregation intervals to the datasets--1 hour and 1 day.

    Time series of institutions: We identify 283 institutions inside the CESNET3 network. These time series aggregated per each institution ID provide a view of the institution's data.

    Time series of institutional subnets: We identify 548 institution subnets inside the CESNET3 network. These time series aggregated per each institution ID provide a view of the institution subnet's data.

    Data Records

    The file hierarchy is described below:

    cesnet-timeseries24/

    |- institution_subnets/

    | |- agg_10_minutes/

    | |- agg_1_hour/

    | |- agg_1_day/

    | |- identifiers.csv

    |- institutions/

    | |- agg_10_minutes/

    | |- agg_1_hour/

    | |- agg_1_day/

    | |- identifiers.csv

    |- ip_addresses_full/

    | |- agg_10_minutes/

    | |- agg_1_hour/

    | |- agg_1_day/

    | |- identifiers.csv

    |- ip_addresses_sample/

    | |- agg_10_minutes/

    | |- agg_1_hour/

    | |- agg_1_day/

    | |- identifiers.csv

    |- times/

    | |- times_10_minutes.csv

    | |- times_1_hour.csv

    | |- times_1_day.csv

    |- ids_relationship.csv
    |- weekends_and_holidays.csv

    The following list describes time series data fields in CSV files:

    • id_time: Unique identifier for each aggregation interval within the time series, used to segment the dataset into specific time periods for analysis.
    • n_flows: Total number of flows observed in the aggregation interval, indicating the volume of distinct sessions or connections for the IP address.
    • n_packets: Total number of packets transmitted during the aggregation interval, reflecting the packet-level traffic volume for the IP address.
    • n_bytes: Total number of bytes transmitted during the aggregation interval, representing the data volume for the IP address.
    • n_dest_ip: Number of unique destination IP addresses contacted by the IP address during the aggregation interval, showing the diversity of endpoints reached.
    • n_dest_asn: Number of unique destination Autonomous System Numbers (ASNs) contacted by the IP address during the aggregation interval, indicating the diversity of networks reached.
    • n_dest_port: Number of unique destination transport layer ports contacted by the IP address during the aggregation interval, representing the variety of services accessed.
    • tcp_udp_ratio_packets: Ratio of packets sent using TCP versus UDP by the IP address during the aggregation interval, providing insight into the transport protocol usage pattern. This metric belongs to the interval <0, 1> where 1 is when all packets are sent over TCP, and 0 is when all packets are sent over UDP.
    • tcp_udp_ratio_bytes: Ratio of bytes sent using TCP versus UDP by the IP address during the aggregation interval, highlighting the data volume distribution between protocols. This metric belongs to the interval <0, 1> with same rule as tcp_udp_ratio_packets.
    • dir_ratio_packets: Ratio of packet directions (inbound versus outbound) for the IP address during the aggregation interval, indicating the balance of traffic flow directions. This metric belongs to the interval <0, 1>, where 1 is when all packets are sent in the outgoing direction from the monitored IP address, and 0 is when all packets are sent in the incoming direction to the monitored IP address.
    • dir_ratio_bytes: Ratio of byte directions (inbound versus outbound) for the IP address during the aggregation interval, showing the data volume distribution in traffic flows. This metric belongs to the interval <0, 1> with the same rule as dir_ratio_packets.
    • avg_duration: Average duration of IP flows for the IP address during the aggregation interval, measuring the typical session length.
    • avg_ttl: Average Time To Live (TTL) of IP flows for the IP address during the aggregation interval, providing insight into the lifespan of packets.

    Moreover, the time series created by re-aggregation contains following time series metrics instead of n_dest_ip, n_dest_asn, and n_dest_port:

    • sum_n_dest_ip: Sum of numbers of unique destination IP addresses.
    • avg_n_dest_ip: The average number of unique destination IP addresses.
    • std_n_dest_ip: Standard deviation of numbers of unique destination IP addresses.
    • sum_n_dest_asn: Sum of numbers of unique destination ASNs.
    • avg_n_dest_asn: The average number of unique destination ASNs.
    • std_n_dest_asn: Standard deviation of numbers of unique destination ASNs)
    • sum_n_dest_port: Sum of numbers of unique destination transport layer ports.
    • avg_n_dest_port: The average number of unique destination transport layer ports.
    • std_n_dest_port: Standard deviation of numbers of unique destination transport layer

  5. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Sale, Arthur; Sale, Arthur (2017). Usage Statistics for University of Tasmania EPrints Repository [Dataset]. https://researchdata.edu.au/usage-statistics-university-eprints-repository/927350

Usage Statistics for University of Tasmania EPrints Repository

Explore at:
Dataset updated
Apr 27, 2017
Dataset provided by
University of Tasmania, Australia
Authors
Sale, Arthur; Sale, Arthur
License

Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically

Description

The dataset is an active collection of access data to information items in the University of Tasmania’s EPrints repository. Each night a task is scheduled to run, and this picks up in the Apache access logs from where it left off the previous night. Each download of an open access full-text item causes the generation of a database record in the MySQL database, together with a timestamp, and an approximate location of the computer system generating the download. This is achieved by looking up the IP address against the GeoIP database, with one significant difference. Downloads originating from a University of Tasmania IP address are separately identified, and removed from the ‘Australia’ category. This eliminates vanity searches from achieving high significance. Countries are coded using the ISO3166 two-letter code.

The dataset has been used to analyse the usage made of the repository and to tune it to achieve maximal visibility for the University of Tasmania. Researchers with items in the repository have used it to identify the types of use being made of their work, and to find potential collaborators. The citation of a work in a journal or conference article, for example, causes a typical step in usage, and the citing article can be searched in Google or Google Scholar to identify the authors. This enhances the dissemination experience and its value.

The software was written in the University of Tasmania by Professor Arthur Sale (in php) based on earlier work by the University of Melbourne (with permission). Mr Christian McGee wrote some critical sections of the code in perl, and set up the cron scheduling.

The dataset is generated by a computer program written by Professor Arthur Sale. The software was a test bed for ideas, and subsequently resulted in an official software set included in the EPrints distribution. This set expanded on the concepts significantly

Search
Clear search
Close search
Google apps
Main menu