Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset of IP ranges in csv format with Country, ASN, City, Country Code. IP range in ipv4 format.
https://www.caida.org/about/legal/aua/public_aua/https://www.caida.org/about/legal/aua/public_aua/
https://www.caida.org/about/legal/aua/https://www.caida.org/about/legal/aua/
A collection of router interface IP addresses geolocated to the city level. 11,857 IP addressed geolocated based on DNS names and 4,838 IP addresses geolocated based on RTT proximity to RIPE Atlas probes. The DNS-based data was created on May 15, 2016. The RTT-proximity data was created from measurements collected on May 25, 2016. The total number of addresses in the dataset is 16586 (109 addresses found to be common between the two sources of data with very similar locations). Data supplement for paper M. Gharaibeh, A. Shah, B. Huffaker, H. Zhang, R. Ensafi, and C. Papadopoulos, A Look at Router Geolocation in Public and Commercial Databases, Proc. Internet Measurement Conference (IMC), Nov 2017.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Login Data Set for Risk-Based Authentication
Synthesized login feature data of >33M login attempts and >3.3M users on a large-scale online service in Norway. Original data collected between February 2020 and February 2021.
This data sets aims to foster research and development for Risk-Based Authentication (RBA) systems. The data was synthesized from the real-world login behavior of more than 3.3M users at a large-scale single sign-on (SSO) online service in Norway.
The users used this SSO to access sensitive data provided by the online service, e.g., a cloud storage and billing information. We used this data set to study how the Freeman et al. (2016) RBA model behaves on a large-scale online service in the real world (see Publication). The synthesized data set can reproduce these results made on the original data set (see Study Reproduction). Beyond that, you can use this data set to evaluate and improve RBA algorithms under real-world conditions.
WARNING: The feature values are plausible, but still totally artificial. Therefore, you should NOT use this data set in productive systems, e.g., intrusion detection systems.
Overview
The data set contains the following features related to each login attempt on the SSO:
Feature | Data Type | Description | Range or Example |
---|---|---|---|
IP Address | String | IP address belonging to the login attempt | 0.0.0.0 - 255.255.255.255 |
Country | String | Country derived from the IP address | US |
Region | String | Region derived from the IP address | New York |
City | String | City derived from the IP address | Rochester |
ASN | Integer | Autonomous system number derived from the IP address | 0 - 600000 |
User Agent String | String | User agent string submitted by the client | Mozilla/5.0 (Windows NT 10.0; Win64; ... |
OS Name and Version | String | Operating system name and version derived from the user agent string | Windows 10 |
Browser Name and Version | String | Browser name and version derived from the user agent string | Chrome 70.0.3538 |
Device Type | String | Device type derived from the user agent string | (mobile , desktop , tablet , bot , unknown )1 |
User ID | Integer | Idenfication number related to the affected user account | [Random pseudonym] |
Login Timestamp | Integer | Timestamp related to the login attempt | [64 Bit timestamp] |
Round-Trip Time (RTT) [ms] | Integer | Server-side measured latency between client and server | 1 - 8600000 |
Login Successful | Boolean | True : Login was successful, False : Login failed | (true , false ) |
Is Attack IP | Boolean | IP address was found in known attacker data set | (true , false ) |
Is Account Takeover | Boolean | Login attempt was identified as account takeover by incident response team of the online service | (true , false ) |
Data Creation
As the data set targets RBA systems, especially the Freeman et al. (2016) model, the statistical feature probabilities between all users, globally and locally, are identical for the categorical data. All the other data was randomly generated while maintaining logical relations and timely order between the features.
The timestamps, however, are not identical and contain randomness. The feature values related to IP address and user agent string were randomly generated by publicly available data, so they were very likely not present in the real data set. The RTTs resemble real values but were randomly assigned among users per geolocation. Therefore, the RTT entries were probably in other positions in the original data set.
The country was randomly assigned per unique feature value. Based on that, we randomly assigned an ASN related to the country, and generated the IP addresses for this ASN. The cities and regions were derived from the generated IP addresses for privacy reasons and do not reflect the real logical relations from the original data set.
The device types are identical to the real data set. Based on that, we randomly assigned the OS, and based on the OS the browser information. From this information, we randomly generated the user agent string. Therefore, all the logical relations regarding the user agent are identical as in the real data set.
The RTT was randomly drawn from the login success status and synthesized geolocation data. We did this to ensure that the RTTs are realistic ones.
Regarding the Data Values
Due to unresolvable conflicts during the data creation, we had to assign some unrealistic IP addresses and ASNs that are not present in the real world. Nevertheless, these do not have any effects on the risk scores generated by the Freeman et al. (2016) model.
You can recognize them by the following values:
ASNs with values >= 500.000
IP addresses in the range 10.0.0.0 - 10.255.255.255 (10.0.0.0/8 CIDR range)
Study Reproduction
Based on our evaluation, this data set can reproduce our study results regarding the RBA behavior of an RBA model using the IP address (IP address, country, and ASN) and user agent string (Full string, OS name and version, browser name and version, device type) as features.
The calculated RTT significances for countries and regions inside Norway are not identical using this data set, but have similar tendencies. The same is true for the Median RTTs per country. This is due to the fact that the available number of entries per country, region, and city changed with the data creation procedure. However, the RTTs still reflect the real-world distributions of different geolocations by city.
See RESULTS.md for more details.
Ethics
By using the SSO service, the users agreed in the data collection and evaluation for research purposes. For study reproduction and fostering RBA research, we agreed with the data owner to create a synthesized data set that does not allow re-identification of customers.
The synthesized data set does not contain any sensitive data values, as the IP addresses, browser identifiers, login timestamps, and RTTs were randomly generated and assigned.
Publication
You can find more details on our conducted study in the following journal article:
Pump Up Password Security! Evaluating and Enhancing Risk-Based Authentication on a Real-World Large-Scale Online Service (2022)
Stephan Wiefling, Paul René Jørgensen, Sigurd Thunem, and Luigi Lo Iacono.
ACM Transactions on Privacy and Security
Bibtex
@article{Wiefling_Pump_2022, author = {Wiefling, Stephan and Jørgensen, Paul René and Thunem, Sigurd and Lo Iacono, Luigi}, title = {Pump {Up} {Password} {Security}! {Evaluating} and {Enhancing} {Risk}-{Based} {Authentication} on a {Real}-{World} {Large}-{Scale} {Online} {Service}}, journal = {{ACM} {Transactions} on {Privacy} and {Security}}, doi = {10.1145/3546069}, publisher = {ACM}, year = {2022} }
License
This data set and the contents of this repository are licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. See the LICENSE file for details. If the data set is used within a publication, the following journal article has to be cited as the source of the data set:
Stephan Wiefling, Paul René Jørgensen, Sigurd Thunem, and Luigi Lo Iacono: Pump Up Password Security! Evaluating and Enhancing Risk-Based Authentication on a Real-World Large-Scale Online Service. In: ACM Transactions on Privacy and Security (2022). doi: 10.1145/3546069
Few (invalid) user agents strings from the original data set could not be parsed, so their device type is empty. Perhaps this parse error is useful information for your studies, so we kept these 1526 entries.↩︎
Sign Up for a free trial: https://rampedup.io/sign-up-%2F-log-in - 7 Days and 50 Credits to test our quality and accuracy.
These are the fields available within the RampedUp Global dataset.
CONTACT DATA: Personal Email Address - We manage over 115 million personal email addresses Professional Email - We manage over 200 million professional email addresses Home Address - We manage over 20 million home addresses Mobile Phones - 65 million direct lines to decision makers Social Profiles - Individual Facebook, Twitter, and LinkedIn Local Address - We manage 65M locations for local office mailers, event-based marketing or face-to-face sales calls.
JOB DATA: Job Title - Standardized titles for ease of use and selection Company Name - The Contact's current employer Job Function - The Company Department associated with the job role Title Level - The Level in the Company associated with the job role Job Start Date - Identify people new to their role as a potential buyer
EMPLOYER DATA: Websites - Company Website, Root Domain, or Full Domain Addresses - Standardized Address, City, Region, Postal Code, and Country Phone - E164 phone with country code Social Profiles - LinkedIn, CrunchBase, Facebook, and Twitter
FIRMOGRAPHIC DATA: Industry - 420 classifications for categorizing the company’s main field of business Sector - 20 classifications for categorizing company industries 4 Digit SIC Code - 239 classifications and their definitions 6 Digit NAICS - 452 classifications and their definitions Revenue - Estimated revenue and bands from 1M to over 1B Employee Size - Exact employee count and bands Email Open Scores - Aggregated data at the domain level showing relationships between email opens and corporate domains. IP Address -Company level IP Addresses associated to Domains from a DNS lookup
CONSUMER DATA:
Education - Alma Mater, Degree, Graduation Date
Skills - Accumulated Skills associated with work experience
Interests - Known interests of contact
Connections - Number of social connections.
Followers - Number of social followers
Download our data dictionary: https://rampedup.io/our-data
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This data about nola.gov provides a window into how people are interacting with the the City of New Orleans online. The data comes from a unified Google Analytics account for New Orleans. We do not track individuals and we anonymize the IP addresses of all visitors.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains all data collected by the CyberLab honeynet experiment, from May 2019 to February 2020.
The experiment was based on the Cowrie honeypot (https://github.com/cowrie/cowrie, versions 1.6.0 and 2.0.2, see below for the timeline) deployed on approximately 50 nodes at different EU and US universities and companies. This number has varied throughout the duration of the experiment due to scaling efforts and the target node availability. All public IP addresses in the dataset are pseudonymized to protect the identity of the destination nodes.
Each file in the dataset is a daily compilation of all connections starting at midnight on that date (date in filename, midnight in UTC time), grouped into "attack sessions". Each event in such a session includes all the data reported by the honeypot software (https://github.com/cowrie/cowrie). The honeypot has been operating in its default (low-interaction) mode using version 1.6.0 from the start of the experiment until November 8, 2019; after that date, we upgraded to Cowrie version 2.0.2, which allowed us to back it by a pool of real Linux instances to provide more convincing high-interaction mode. Results from high-interaction mode are tagged with "sensor:ubuntu_basic_pool".
Geolocation data was added to Cowrie output messages based on the source IP address.
Field Description =============================== =========================================================== session_id Unique ID of the session dst_ip_identifier Pseudonymized dst public IPv4 of the honeypot node dst_host_identifier Obfuscated (pseudonymized) name of the honeypot node src_ip_identifier Obfuscated (pseudonymized) IP address of the attacker eventid Event id of the session in the cowrie honeypot timestamp UTC time of the event message Message of the Cowrie honeypot protocol Protocol used in the cowrie honeypot; either ssh or telnet geolocation_data/postal_code Source IP postal code as (determined by logstash) geolocation_data/continent_code Source IP continent code (as determined by logstash) geolocation_data/country_code3 Source IP country code3 (as determined by logstash) geolocation_data/region_name Source IP region name (as determined by logstash) geolocation_data/latitude Source IP latitude (as determined by logstash) geolocation_data/longitude Source IP longitude (as determined by logstash) geolocation_data/country_name Source IP full country name (as determined by logstash) geolocation_data/timezone Source IP timezone geolocation_data/country_code2 Source IP country code2 geolocation_data/region_code Source IP region code geolocation_data/city_name Source IP city name src_port Source TCP port sensor Sensor name; serves to identify our experiment config arch Represents the CPU/OS architecture emulated by cowrie duration Session duration in seconds ssh_client_version Attacker's SSH client version username Login username; only used for login events password Password; only used for login events macCS HMAC algorithms supported by the client encCS Encryption algorithms supported by the client kexAlgs Key exchange algorithms supported by the client keyAlgs Public key algorithms supported by the client
More detailed description of the fields (with examples) and all subsequent data (after February 2020) can be found at cyber.ltfe.org.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This respository contains the CLUE-LDS (CLoud-based User Entity behavior analytics Log Data Set). The data set contains log events from real users utilizing a cloud storage suitable for User Entity Behavior Analytics (UEBA). Events include logins, file accesses, link shares, config changes, etc. The data set contains around 50 million events generated by more than 5000 distinct users in more than five years (2017-07-07 to 2022-09-29 or 1910 days). The data set is complete except for 109 events missing on 2021-04-22, 2021-08-20, and 2021-09-05 due to database failure. The unpacked file size is around 14.5 GB. A detailed analysis of the data set is provided in [1].
The logs are provided in JSON format with the following attributes in the first level:
In the following data sample, the first object depicts a successful user login (see type: login_successful) and the second object depicts a file access (see type: file_accessed) from a remote location:
{"params": {"user": "intact-gray-marlin-trademarkagent"}, "type": "login_successful", "time": "2019-11-14T11:26:43Z", "uid": "intact-gray-marlin-trademarkagent", "id": 21567530, "uidType": "name"}
{"isLocalIP": false, "params": {"path": "/proud-copper-orangutan-artexer/doubtful-plum-ptarmigan-merchant/insufficient-amaranth-earthworm-qualitycontroller/curious-silver-galliform-tradingstandards/incredible-indigo-octopus-printfinisher/wicked-bronze-sloth-claimsmanager/frantic-aquamarine-horse-cleric"}, "type": "file_accessed", "time": "2019-11-14T11:26:51Z", "uid": "graceful-olive-spoonbill-careersofficer", "id": 21567531, "location": {"countryCode": "AT", "countryName": "Austria", "region": "4", "city": "Gmunden", "latitude": 47.915, "longitude": 13.7959, "timezone": "Europe/Vienna", "postalCode": "4810", "metroCode": null, "regionName": "Upper Austria", "isInEuropeanUnion": true, "continent": "Europe", "accuracyRadius": 50}, "uidType": "ipaddress"}
The data set was generated at the premises of Huemer Group, a midsize IT service provider located in Vienna, Austria. Huemer Group offers a range of Infrastructure-as-a-Service solutions for enterprises, including cloud computing and storage. In particular, their cloud storage solution called hBOX enables customers to upload their data, synchronize them with multiple devices, share files with others, create versions and backups of their documents, collaborate with team members in shared data spaces, and query the stored documents using search terms. The hBOX extends the open-source project Nextcloud with interfaces and functionalities tailored to the requirements of customers.
The data set comprises only normal user behavior, but can be used to evaluate anomaly detection approaches by simulating account hijacking. We provide an implementation for identifying similar users, switching pairs of users to simulate changes of behavior patterns, and a sample detection approach in our github repo.
Acknowledgements: Partially funded by the FFG project DECEPT (873980). The authors thank Walter Huemer, Oskar Kruschitz, Kevin Truckenthanner, and Christian Aigner from Huemer Group for supporting the collection of the data set.
If you use the dataset, please cite the following publication:
[1] M. Landauer, F. Skopik, G. Höld, and M. Wurzenberger. "A User and Entity Behavior Analytics Log Data Set for Anomaly Detection in Cloud Computing". 2022 IEEE International Conference on Big Data - 6th International Workshop on Big Data Analytics for Cyber Intelligence and Defense (BDA4CID 2022), December 17-20, 2022, Osaka, Japan. IEEE. [PDF]
Company Intelligence
Name and Websites - Company Website and Alternative Domains.
Address - Standardized headquarter Address, City, Region, Zip Code, and Country
LAT / LONG - Used for Geo Location
Locations - Additional office locations of the business
Phone - Standardized headquarter phone with country code
Social Profiles - LinkedIn, CrunchBase, Facebook, Twitter, Yelp, Instagram
Type - Headquarters, Branch, Local Only
Description - detailed overview of the company business model and pursuit.
Industry - Standardized Industries to segment companies by their most notable contributions
Sector - 20 industry groupings
Specialties - Non industry details shared by the company to better understand what they do
SIC Code - 839 industry classifications and their definitions
Revenue - Annual revenue from 1M to over 1B
Employee - Number of Employees at the company
Similar Companies - used to identify competitors Funding - for start up data IP Address - from the hosted website Affiliated Companies - company hierarchy
Our consumer data is gathered and aggregated via surveys, digital services, and public data sources. We use powerful profiling algorithms to collect and ingest only fresh and reliable data points.
Our comprehensive data enrichment solution includes a variety of data sets that can help you address gaps in your customer data, gain a deeper understanding of your customers, and power superior client experiences. 1. Geography - City, State, ZIP, County, CBSA, Census Tract, etc. 2. Demographics - Gender, Age Group, Marital Status, Language etc. 3. Financial - Income Range, Credit Rating Range, Credit Type, Net worth Range, etc 4. Persona - Consumer type, Communication preferences, Family type, etc 5. Interests - Content, Brands, Shopping, Hobbies, Lifestyle etc. 6. Household - Number of Children, Number of Adults, IP Address, etc. 7. Behaviours - Brand Affinity, App Usage, Web Browsing etc. 8. Firmographics - Industry, Company, Occupation, Revenue, etc 9. Retail Purchase - Store, Category, Brand, SKU, Quantity, Price etc. 10. Auto - Car Make, Model, Type, Year, etc. 11. Housing - Home type, Home value, Renter/Owner, Year Built etc.
Consumer Graph Schema & Reach: Our data reach represents the total number of counts available within various categories and comprises attributes such as country location, MAU, DAU & Monthly Location Pings:
Data Export Methodology: Since we collect data dynamically, we provide the most updated data and insights via a best-suited method on a suitable interval (daily/weekly/monthly).
Consumer Graph Use Cases: 360-Degree Customer View: Get a comprehensive image of customers by the means of internal and external data aggregation. Data Enrichment: Leverage Online to offline consumer profiles to build holistic audience segments to improve campaign targeting using user data enrichment Fraud Detection: Use multiple digital (web and mobile) identities to verify real users and detect anomalies or fraudulent activity. Advertising & Marketing: Understand audience demographics, interests, lifestyle, hobbies, and behaviors to build targeted marketing campaigns.
Here's the schema of Consumer Data:
person_id
first_name
last_name
age
gender
linkedin_url
twitter_url
facebook_url
city
state
address
zip
zip4
country
delivery_point_bar_code
carrier_route
walk_seuqence_code
fips_state_code
fips_country_code
country_name
latitude
longtiude
address_type
metropolitan_statistical_area
core_based+statistical_area
census_tract
census_block_group
census_block
primary_address
pre_address
streer
post_address
address_suffix
address_secondline
address_abrev
census_median_home_value
home_market_value
property_build+year
property_with_ac
property_with_pool
property_with_water
property_with_sewer
general_home_value
property_fuel_type
year
month
household_id
Census_median_household_income
household_size
marital_status
length+of_residence
number_of_kids
pre_school_kids
single_parents
working_women_in_house_hold
homeowner
children
adults
generations
net_worth
education_level
occupation
education_history
credit_lines
credit_card_user
newly_issued_credit_card_user
credit_range_new
credit_cards
loan_to_value
mortgage_loan2_amount
mortgage_loan_type
mortgage_loan2_type
mortgage_lender_code
mortgage_loan2_render_code
mortgage_lender
mortgage_loan2_lender
mortgage_loan2_ratetype
mortgage_rate
mortgage_loan2_rate
donor
investor
interest
buyer
hobby
personal_email
work_email
devices
phone
employee_title
employee_department
employee_job_function
skills
recent_job_change
company_id
company_name
company_description
technologies_used
office_address
office_city
office_country
office_state
office_zip5
office_zip4
office_carrier_route
office_latitude
office_longitude
office_cbsa_code
office_census_block_group
office_census_tract
office_county_code
company_phone
company_credit_score
company_csa_code
company_dpbc
company_franchiseflag
company_facebookurl
company_linkedinurl
company_twitterurl
company_website
company_fortune_rank
company_government_type
company_headquarters_branch
company_home_business
company_industry
company_num_pcs_used
company_num_employees
company_firm_individual
company_msa
company_msa_name
company_naics_code
company_naics_description
company_naics_code2
company_naics_description2
company_sic_code2
company_sic_code2_description
company_sic_code4
company_sic_code4_description
company_sic_code6
company_sic_code6_description
company_sic_code8
company_sic_code8_description
company_parent_company
company_parent_company_location
company_public_private
company_subsidiary_company
company_residential_business_code
company_revenue_at_side_code
company_revenue_range
company_revenue
company_sales_volume
company_small_business
company_stock_ticker
company_year_founded
company_minorityowned
company_female_owned_or_operated
company_franchise_code
company_dma
company_dma_name
company_hq_address
company_hq_city
company_hq_duns
company_hq_state
company_hq_zip5
company_hq_zip4
co...
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset of IP ranges in csv format with Country, ASN, City, Country Code. IP range in ipv4 format.