Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
The dataset is an active collection of access data to information items in the University of Tasmania’s EPrints repository. Each night a task is scheduled to run, and this picks up in the Apache access logs from where it left off the previous night. Each download of an open access full-text item causes the generation of a database record in the MySQL database, together with a timestamp, and an approximate location of the computer system generating the download. This is achieved by looking up the IP address against the GeoIP database, with one significant difference. Downloads originating from a University of Tasmania IP address are separately identified, and removed from the ‘Australia’ category. This eliminates vanity searches from achieving high significance. Countries are coded using the ISO3166 two-letter code.
The dataset has been used to analyse the usage made of the repository and to tune it to achieve maximal visibility for the University of Tasmania. Researchers with items in the repository have used it to identify the types of use being made of their work, and to find potential collaborators. The citation of a work in a journal or conference article, for example, causes a typical step in usage, and the citing article can be searched in Google or Google Scholar to identify the authors. This enhances the dissemination experience and its value.
The software was written in the University of Tasmania by Professor Arthur Sale (in php) based on earlier work by the University of Melbourne (with permission). Mr Christian McGee wrote some critical sections of the code in perl, and set up the cron scheduling.
The dataset is generated by a computer program written by Professor Arthur Sale. The software was a test bed for ideas, and subsequently resulted in an official software set included in the EPrints distribution. This set expanded on the concepts significantly
Sign Up for a free trial: https://rampedup.io/sign-up-%2F-log-in - 7 Days and 50 Credits to test our quality and accuracy.
These are the fields available within the RampedUp Global dataset.
CONTACT DATA: Personal Email Address - We manage over 115 million personal email addresses Professional Email - We manage over 200 million professional email addresses Home Address - We manage over 20 million home addresses Mobile Phones - 65 million direct lines to decision makers Social Profiles - Individual Facebook, Twitter, and LinkedIn Local Address - We manage 65M locations for local office mailers, event-based marketing or face-to-face sales calls.
JOB DATA: Job Title - Standardized titles for ease of use and selection Company Name - The Contact's current employer Job Function - The Company Department associated with the job role Title Level - The Level in the Company associated with the job role Job Start Date - Identify people new to their role as a potential buyer
EMPLOYER DATA: Websites - Company Website, Root Domain, or Full Domain Addresses - Standardized Address, City, Region, Postal Code, and Country Phone - E164 phone with country code Social Profiles - LinkedIn, CrunchBase, Facebook, and Twitter
FIRMOGRAPHIC DATA: Industry - 420 classifications for categorizing the company’s main field of business Sector - 20 classifications for categorizing company industries 4 Digit SIC Code - 239 classifications and their definitions 6 Digit NAICS - 452 classifications and their definitions Revenue - Estimated revenue and bands from 1M to over 1B Employee Size - Exact employee count and bands Email Open Scores - Aggregated data at the domain level showing relationships between email opens and corporate domains. IP Address -Company level IP Addresses associated to Domains from a DNS lookup
CONSUMER DATA:
Education - Alma Mater, Degree, Graduation Date
Skills - Accumulated Skills associated with work experience
Interests - Known interests of contact
Connections - Number of social connections.
Followers - Number of social followers
Download our data dictionary: https://rampedup.io/our-data
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset is an active collection of access data to information items in the University of Tasmania's EPrints repository. Each night a task is scheduled to run, and this picks up in the Apache access logs from where it left off the previous night. Each download of an open access full-text item causes the generation of a database record in the MySQL database, together with a timestamp, and an approximate location of the computer system generating the download. This is achieved by looking up the IP address against the GeoIP database, with one significant difference. Downloads originating from a University of Tasmania IP address are separately identified, and removed from the Australia category. This eliminates vanity searches from achieving high significance. Countries are coded using the ISO3166 two-letter code.
The dataset has been used to analyse the usage made of the repository and to tune it to achieve maximal visibility for the University of Tasmania. Researchers with items in the repository have used it to identify the types of use being made of their work, and to find potential collaborators. The citation of a work in a journal or conference article, for example, causes a typical step in usage, and the citing article can be searched in Google or Google Scholar to identify the authors. This enhances the dissemination experience and its value.
The software was written in the University of Tasmania by Professor Arthur Sale (in php) based on earlier work by the University of Melbourne (with permission). Mr Christian McGee wrote some critical sections of the code in perl, and set up the cron scheduling.
The dataset is generated by a computer program written by Professor Arthur Sale. The software was a test bed for ideas, and subsequently resulted in an official software set included in the EPrints distribution. This set expanded on the concepts significantly
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset called CESNET-TimeSeries24 was collected by long-term monitoring of selected statistical metrics for 40 weeks for each IP address on the ISP network CESNET3 (Czech Education and Science Network). The dataset encompasses network traffic from more than 275,000 active IP addresses, assigned to a wide variety of devices, including office computers, NATs, servers, WiFi routers, honeypots, and video-game consoles found in dormitories. Moreover, the dataset is also rich in network anomaly types since it contains all types of anomalies, ensuring a comprehensive evaluation of anomaly detection methods.
Last but not least, the CESNET-TimeSeries24 dataset provides traffic time series on institutional and IP subnet levels to cover all possible anomaly detection or forecasting scopes. Overall, the time series dataset was created from the 66 billion IP flows that contain 4 trillion packets that carry approximately 3.7 petabytes of data. The CESNET-TimeSeries24 dataset is a complex real-world dataset that will finally bring insights into the evaluation of forecasting models in real-world environments.
Please cite the usage of our dataset as:
Koumar, J., Hynek, K., Čejka, T. et al. CESNET-TimeSeries24: Time Series Dataset for Network Traffic Anomaly Detection and Forecasting. Sci Data 12, 338 (2025). https://doi.org/10.1038/s41597-025-04603-x
@Article{cesnettimeseries24,
author={Koumar, Josef and Hynek, Karel and {\v{C}}ejka, Tom{\'a}{\v{s}} and {\v{S}}i{\v{s}}ka, Pavel},
title={CESNET-TimeSeries24: Time Series Dataset for Network Traffic Anomaly Detection and Forecasting},
journal={Scientific Data},
year={2025},
month={Feb},
day={26},
volume={12},
number={1},
pages={338},
issn={2052-4463},
doi={10.1038/s41597-025-04603-x},
url={https://doi.org/10.1038/s41597-025-04603-x}
}
We create evenly spaced time series for each IP address by aggregating IP flow records into time series datapoints. The created datapoints represent the behavior of IP addresses within a defined time window of 10 minutes. The vector of time-series metrics v_{ip, i} describes the IP address ip in the i-th time window. Thus, IP flows for vector v_{ip, i} are captured in time windows starting at t_i and ending at t_{i+1}. The time series are built from these datapoints.
Datapoints created by the aggregation of IP flows contain the following time-series metrics:
Multiple time aggregation: The original datapoints in the dataset are aggregated by 10 minutes of network traffic. The size of the aggregation interval influences anomaly detection procedures, mainly the training speed of the detection model. However, the 10-minute intervals can be too short for longitudinal anomaly detection methods. Therefore, we added two more aggregation intervals to the datasets--1 hour and 1 day.
Time series of institutions: We identify 283 institutions inside the CESNET3 network. These time series aggregated per each institution ID provide a view of the institution's data.
Time series of institutional subnets: We identify 548 institution subnets inside the CESNET3 network. These time series aggregated per each institution ID provide a view of the institution subnet's data.
The file hierarchy is described below:
cesnet-timeseries24/
|- institution_subnets/
| |- agg_10_minutes/
| |- agg_1_hour/
| |- agg_1_day/
| |- identifiers.csv
|- institutions/
| |- agg_10_minutes/
| |- agg_1_hour/
| |- agg_1_day/
| |- identifiers.csv
|- ip_addresses_full/
| |- agg_10_minutes/
| |- agg_1_hour/
| |- agg_1_day/
| |- identifiers.csv
|- ip_addresses_sample/
| |- agg_10_minutes/
| |- agg_1_hour/
| |- agg_1_day/
| |- identifiers.csv
|- times/
| |- times_10_minutes.csv
| |- times_1_hour.csv
| |- times_1_day.csv
|- ids_relationship.csv
|- weekends_and_holidays.csv
The following list describes time series data fields in CSV files:
Moreover, the time series created by re-aggregation contains following time series metrics instead of n_dest_ip, n_dest_asn, and n_dest_port:
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
The dataset is an active collection of access data to information items in the University of Tasmania’s EPrints repository. Each night a task is scheduled to run, and this picks up in the Apache access logs from where it left off the previous night. Each download of an open access full-text item causes the generation of a database record in the MySQL database, together with a timestamp, and an approximate location of the computer system generating the download. This is achieved by looking up the IP address against the GeoIP database, with one significant difference. Downloads originating from a University of Tasmania IP address are separately identified, and removed from the ‘Australia’ category. This eliminates vanity searches from achieving high significance. Countries are coded using the ISO3166 two-letter code.
The dataset has been used to analyse the usage made of the repository and to tune it to achieve maximal visibility for the University of Tasmania. Researchers with items in the repository have used it to identify the types of use being made of their work, and to find potential collaborators. The citation of a work in a journal or conference article, for example, causes a typical step in usage, and the citing article can be searched in Google or Google Scholar to identify the authors. This enhances the dissemination experience and its value.
The software was written in the University of Tasmania by Professor Arthur Sale (in php) based on earlier work by the University of Melbourne (with permission). Mr Christian McGee wrote some critical sections of the code in perl, and set up the cron scheduling.
The dataset is generated by a computer program written by Professor Arthur Sale. The software was a test bed for ideas, and subsequently resulted in an official software set included in the EPrints distribution. This set expanded on the concepts significantly