23 datasets found

Password Strength and Vulnerability Dataset

kaggle.com

zip

Updated Jul 31, 2023

Facebook

Twitter

Click to copy link

Link copied

Cite

Utkarsh Singh (2023). Password Strength and Vulnerability Dataset [Dataset]. https://www.kaggle.com/datasets/utkarshx27/passwords

Explore at:

zip(6094 bytes)Available download formats

Dataset updated

Jul 31, 2023

Authors

Utkarsh Singh

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Data is sourced from Information is Beautiful, with the graphic coming from the same group here.

There's lots of additional information about password quality & strength in the source Doc. Please note that the "strength" column in this dataset is relative to these common aka "bad" passwords and YOU SHOULDN'T USE ANY OF THEM!

Wikipedia has a nice article on password strength as well.

Data Dictionary

`passwords.csv`

variable	class	description
rank	double	popularity in their database of released passwords
password	character	Actual text of the password
category	character	What category does the password fall in to?
value	double	Time to crack by online guessing
time_unit	character	Time unit to match with value
offline_crack_sec	double	Time to crack offline in seconds
rank_alt	double	Rank 2
strength	double	Strength = quality of password where 10 is highest, 1 is lowest, please note that these are relative to these generally bad passwords
font_size	double	Used to create the graphic for KIB

a
CrackStation's Password Cracking Dictionary
academictorrents.com
bittorrent
Updated Mar 22, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Defuse Security (2018). CrackStation's Password Cracking Dictionary [Dataset]. https://academictorrents.com/details/fd62cc1d79f595cbe1de6356fb13c2165994e469
Explore at:
bittorrent(4500756826)Available download formats
Dataset updated
Mar 22, 2018
Dataset authored and provided by
Defuse Security
License
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
Description
The list contains every wordlist, dictionary, and password database leak that I could find on the internet (and I spent a LOT of time looking). It also contains every word in the Wikipedia databases (pages-articles, retrieved 2010, all languages) as well as lots of books from Project Gutenberg. It also includes the passwords from some low-profile database breaches that were being sold in the underground years ago. The format of the list is a standard text file sorted in non-case-sensitive alphabetical order. Lines are separated with a newline " " character. You can test the list without downloading it by giving SHA256 hashes to the free hash cracker or to @PlzCrack on twitter. Here s a tool for computing hashes easily. Here are the results of cracking LinkedIn s and eHarmony s password hash leaks with the list. The list is responsible for cracking about 30% of all hashes given to CrackStation s free hash cracker, but that figure should be taken with a grain of salt because s
Bruteforce Database - Password dictionaries
kaggle.com
zip
Updated Feb 19, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Taranveer Singh Anttal (2020). Bruteforce Database - Password dictionaries [Dataset]. https://www.kaggle.com/taranvee/bruteforce-database-password-dictionaries
Explore at:
zip(42314895 bytes)Available download formats
Dataset updated
Feb 19, 2020
Authors
Taranveer Singh Anttal
Description
Password dictionaries:

8-more-passwords.txt sorting only passwords with more than 8 characters, removed all numeric passes, removed consecutive characters (3 characters or more), removed all-lowercase passwords, passwords without a capital letter and also a number (61,682 passwords). 7-more-passwords.txt it consists of passwords 7 characters or more, and numeric passwords removed (528,136 passwords). 1000000_password_seclists.txt 1,000,000 password from SecLists bitcoin-brainwallet.lst bitcoin-brainwallet with 394,748 lines usernames.txt collection username of/at US - 403,355 lines us-cities.txt list 20,580 cities at US facebook-firstnames.txt 4,347,667 of Facebook first names. 2151220-passwords.txt 2,151,220 passwords from dazzlepod.com subdomains-10000.txt 10,000 subdomain for domain scanner. 38650-password-sktorrent.txt 38,650 passwords from sktorrent.eu. uniqpass_v16_password.txt UNIQPASS is a large password list for use with John the Ripper (JtR) wordlist mode to translate large number of hashes, e.g. MD5 hashes, into cleartext passwords indo-cities.txt list 102 cities at Indonesia 38650-username-sktorrent.txt 38,650 usernames from sktorrent.eu. forced-browsing every wordlist you need for forced browsing.

MIT License

Copyright (c) 2015 Van-Duyet Le

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Ultimate Cybersecurity Password & Username Dataset
kaggle.com
zip
Updated Jun 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
hrterhrter (2024). Ultimate Cybersecurity Password & Username Dataset [Dataset]. https://www.kaggle.com/datasets/programmerrdai/brute-force-database
Explore at:
zip(42311134 bytes)Available download formats
Dataset updated
Jun 9, 2024
Authors
hrterhrter
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
8-more-passwords.txt: Contains passwords with more than 8 characters. Excludes numeric-only passwords, consecutive characters (3 or more), all-lowercase passwords, and passwords without at least one capital letter and one number. Total: 61,682 passwords. 7-more-passwords.txt: Includes passwords with 7 characters or more. Numeric passwords are removed. Total: 528,136 passwords. 1000000_password_seclists.txt: A collection of 1,000,000 passwords from SecLists. bitcoin-brainwallet.lst: Bitcoin brainwallet with 394,748 entries. usernames.txt: Collection of 403,355 usernames from the US. us-cities.txt: List of 20,580 cities in the US. facebook-firstnames.txt: Contains 4,347,667 first names from Facebook. 2151220-passwords.txt: Collection of 2,151,220 passwords from dazzlepod.com. subdomains-10000.txt: List of 10,000 subdomains for domain scanning. 38650-password-sktorrent.txt: Contains 38,650 passwords from sktorrent.eu. uniqpass_v16_password.txt: UNIQPASS is a large password list for use with John the Ripper (JtR) in wordlist mode to convert large numbers of hashes, such as MD5, into cleartext passwords. indo-cities.txt: List of 102 cities in Indonesia. 38650-username-sktorrent.txt: Contains 38,650 usernames from sktorrent.eu. forced-browsing: Contains every wordlist needed for forced browsing.
❗RockYou2024.txt| 10B Common Passwords List
kaggle.com
zip
Updated Jul 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BwandoWando (2024). ❗RockYou2024.txt| 10B Common Passwords List [Dataset]. https://www.kaggle.com/datasets/bwandowando/common-password-list-rockyou2024-txt
Explore at:
zip(56987494791 bytes)Available download formats
Dataset updated
Jul 10, 2024
Authors
BwandoWando
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Image

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1842206%2Ff29f742e3d48f66bf0eccf60abf631d1%2Frockyo2.png?generation=1720539563047126&alt=media" alt="">

Kaggle Previous Version of RockYou.txt

https://www.googleapis.com/download/storage/v1/b/kaggle-forum-message-attachments/o/inbox%2F1842206%2F0e4b20e3662c065318f7feefb42ef785%2Foriginal.png?generation=1720578063663708&alt=media" alt="">

The original RockYou.txt dataset was uploaded by @wjburns 5 years ago, with 95K downloads and 640 upvotes, which means Kaggle allows this type of data for research and educational purposes.

Files

I separated the single 160GB txt file into smaller files with filenames based on first character to make it easier to utilize for those with less powerful machines.

letters (A-Z)

digits (0-9)

dollarsymbol ($)

symbols (other symbols)

others (those that cant be categorized by any of those above)

Note

The original 160GB file was written with an encoding of utf8, I used the same encoding for the files above.

The contents of the files above are UNSORTED

The contents are NOT DEDUPLICATED

History

https://en.wikipedia.org/wiki/RockYou

Everyone involved with Capture The Flag (CTF) has used the infamous rockyou.txt wordlist at least once, mainly to perform password cracking activities. The file is a list of 14 million unique passwords originating from the 2009 RockYou hack making a piece of computer security history. The “rockyou lineage” has evolved over the years.

https://www.youtube.com/watch?v=0_mQACSn6XM" alt="">

RockYou2024.txt

With the 2021 version we touched high numbers but with the newest release is the (apparently) ultimate amalgamation. RockYou2024 has been released by the user “ObamaCare” . This new version added 1.5 billion of records to the 2021 version reaching the 10 billions records. A wordlist can potentially be used for a multitude of tasks and having this number of records in a single file, especially in 2024 with increasingly aggressive data breaches, is a dream come true for attackers. The user have not specified the nature of the additional records but punctualize the new data comes from recent leaked databases.

From The New RockYou2024 Collection has been published!

Source

I got it from https://github.com/hkphh/rockyou2024.txt, but it was originally shared by a certain aka ObamaCare which I don't have any affiliation nor association with.

Original TxtFile

In case you'd like to process the RockYou2024.txt yourself, you can find it here ❗Original RockYou2024.txt zip file

Strong Passwords Only

In case you'd like to see only the "Strong Passwords", you can find it here ❗180 Million "Strong Passwords" in RockYou2024.txt

Cover Image

Generated with Bing Image Generator
Charset distribution of leaked user passwords worldwide 2017
statista.com
Updated Jan 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Charset distribution of leaked user passwords worldwide 2017 [Dataset]. https://www.statista.com/statistics/744183/worldwide-character-set-distribution-of-passwords/
Explore at:
Dataset updated
Jan 9, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Aug 2017
Area covered
Worldwide
Description
The statistic shows the distribution of password character sets found among various databases leaked online as of 2017. From 320 million hashed passwords that were analyzed, 49 percent were found to be a mix of numbers and lowercase alphabetic characters. Just two percent of passwords were a mix of numbers, upper- and lowercase alphabetic characters, and symbols.
P
Password Management Market Report
promarketreports.com
doc, pdf, ppt
Updated Apr 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pro Market Reports (2025). Password Management Market Report [Dataset]. https://www.promarketreports.com/reports/password-management-market-7993
Explore at:
doc, ppt, pdfAvailable download formats
Dataset updated
Apr 11, 2025
Dataset authored and provided by
Pro Market Reports
License
https://www.promarketreports.com/privacy-policyhttps://www.promarketreports.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The size of the Password Management Market was valued at USD 2 Billion in 2023 and is projected to reach USD 6.37 Billion by 2032, with an expected CAGR of 18% during the forecast period. Recent developments include: July 2022: Google updated its password managers by integrating various highly requested features to help consumers, like auto-login, credential saving, and password generation. This led to enhanced market growth owing to the higher utilization of the Google Chrome browser for web surfing and remote working., June 2022: Lookout Inc. acquired SaferPass, offering simple and secure password managers for enterprises and individuals. The acquisition helps in delivering proactive security platforms to safeguard user data and privacy while broadening the business footprint., January 2022: Keepers Security launched Secrets Manager, which secured infrastructure credentials like API keys, certificates, access keys, and database passwords. The solution included cloud-based integration with a zero-knowledge security model similar to their enterprise password management platform..
❗RockYou2024.txt| 180 Million "Strong Passwords"
kaggle.com
zip
Updated Jul 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BwandoWando (2024). ❗RockYou2024.txt| 180 Million "Strong Passwords" [Dataset]. https://www.kaggle.com/datasets/bwandowando/strong-passwords-in-rockyou2024-txt
Explore at:
zip(876464069 bytes)Available download formats
Dataset updated
Jul 11, 2024
Authors
BwandoWando
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Image

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1842206%2F4a05853cd3e61cc5414534f8c8a82c32%2Fstrongpassword2.png?generation=1720631443593955&alt=media" alt="">

Description

I extracted all entries from the RockYou2024.txt with the following characteristics - Between 8 to 32 characters - Has at least one upper-case character - Has at least one small-case character - Has at least one digit - Has at least one punctuation mark

Note

The contents per file are DEDUPLICATED

The contents per file are SORTED

White spaces between characters are INCLUDED

Kaggle Previous Version of RockYou.txt

https://www.googleapis.com/download/storage/v1/b/kaggle-forum-message-attachments/o/inbox%2F1842206%2F0e4b20e3662c065318f7feefb42ef785%2Foriginal.png?generation=1720578063663708&alt=media" alt="">

The original RockYou.txt dataset was uploaded by @wjburns 5 years ago, with 95K downloads and 640 upvotes, which means Kaggle allows this type of data for research and educational purposes.

Note

The original 160GB file was written with an encoding of utf8, I used the same encoding for the files above.

History

https://en.wikipedia.org/wiki/RockYou

Everyone involved with Capture The Flag (CTF) has used the infamous rockyou.txt wordlist at least once, mainly to perform password cracking activities. The file is a list of 14 million unique passwords originating from the 2009 RockYou hack making a piece of computer security history. The “rockyou lineage” has evolved over the years.

https://www.youtube.com/watch?v=0_mQACSn6XM" alt="">

RockYou2024.txt

With the 2021 version we touched high numbers but with the newest release is the (apparently) ultimate amalgamation. RockYou2024 has been released by the user “ObamaCare” . This new version added 1.5 billion of records to the 2021 version reaching the 10 billions records. A wordlist can potentially be used for a multitude of tasks and having this number of records in a single file, especially in 2024 with increasingly aggressive data breaches, is a dream come true for attackers. The user have not specified the nature of the additional records but punctualize the new data comes from recent leaked databases.

From The New RockYou2024 Collection has been published!

Source

I got it from https://github.com/hkphh/rockyou2024.txt, but it was originally shared by a certain aka ObamaCare which I don't have any affiliation nor association with.

Variations of Dataset

In case you'd like to process the RockYou2024.txt yourself, you can find it here ❗Original RockYou2024.txt zip file

In case you'd like to download the contents segmentized by contents' first characters, you can find it here ❗RockYou2024.zip| 10 BILLION Common Passwords List For your Research and Analysis needs (160GB)

Reminder

Use responsibly

Cover Image

Generated with Bing Image Generator
p
Dark Web Gmail Credentials Database
passwordrevelator.net
Updated Oct 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Various Hacking Groups (2023). Dark Web Gmail Credentials Database [Dataset]. https://www.passwordrevelator.net/en/passbreaker
Explore at:
Dataset updated
Oct 29, 2023
Dataset authored and provided by
Various Hacking Groups
Description
Over 3.2 billion email addresses with passwords posted on Dark Web from massive Google platform data leaks
Leaked passwords of the VimeWorld minecraft server
kaggle.com
zip
Updated Dec 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
PocoMoco (2023). Leaked passwords of the VimeWorld minecraft server [Dataset]. https://www.kaggle.com/datasets/tempuserpavelbiz/leaked-passwords-of-the-vimeworld-minecraft-server/discussion
Explore at:
zip(88294251 bytes)Available download formats
Dataset updated
Dec 9, 2023
Authors
PocoMoco
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
About data 🎈

The Minecraft database of the VimeWorld server, containing passwords in md5 and bcrypt hashes. I bruteforce passwords only for md5 hashes, as they are the easiest to crack. In total, I managed to decrypt just over 90% of all md5 hashes.

The structure parquet file 🟡

username - player nickname;

username_cc - content of certain characters in the nickname, namely:

d - digits

l - lowercase letters

u - uppercase letters

s - special characters

username_len - nickname length

password_value - password

password_mask - password mask (p 2.)

password_cc - (p 2.)

password_len - password length
G
Secretless Database Connectivity Market Research Report 2033
growthmarketreports.com
csv, pdf, pptx
Updated Oct 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Growth Market Reports (2025). Secretless Database Connectivity Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/secretless-database-connectivity-market
Explore at:
csv, pdf, pptxAvailable download formats
Dataset updated
Oct 4, 2025
Dataset authored and provided by
Growth Market Reports
Time period covered
2024 - 2032
Area covered
Global
Description
Secretless Database Connectivity Market Outlook

According to our latest research, the global Secretless Database Connectivity market size reached USD 1.47 billion in 2024, reflecting robust demand for secure, seamless, and scalable database access solutions across industries. The market is poised to expand at a CAGR of 17.8% from 2025 to 2033, with the forecasted market size projected to reach USD 6.13 billion by 2033. This remarkable growth is primarily driven by the urgent need to address rising cybersecurity threats, regulatory pressures, and the increasing adoption of cloud-native architectures that demand advanced, secretless approaches to database connectivity.

A significant growth factor for the Secretless Database Connectivity market is the escalating sophistication of cyberattacks targeting database credentials and access points. Traditional methods of database authentication, which rely on static secrets such as passwords and API keys, are increasingly vulnerable to breaches and leaks. The shift towards secretless architecture, where credentials are abstracted and managed dynamically without exposing them to applications or users, offers a powerful mitigation against these risks. Enterprises are recognizing the value of secretless solutions in reducing the attack surface, enhancing compliance with regulations like GDPR and HIPAA, and simplifying credential management processes. This awareness is fueling market adoption across sectors, particularly in industries handling sensitive or regulated data.

Another pivotal driver is the rapid proliferation of cloud-native technologies, microservices, and containerized environments. Modern application architectures demand agile, scalable, and automated approaches to database connectivity, which traditional secret management tools often struggle to deliver. Secretless Database Connectivity enables seamless integration with DevOps pipelines, supports dynamic scaling, and eliminates the operational burden of credential rotation and distribution. Organizations pursuing digital transformation and cloud migration initiatives are increasingly turning to secretless solutions to achieve continuous delivery, operational efficiency, and improved developer productivity. This alignment with broader IT modernization trends is expected to sustain high growth rates in the market.

Furthermore, the evolving regulatory landscape is compelling organizations to adopt more robust security postures, including secretless access mechanisms. Regulatory frameworks across North America, Europe, and Asia Pacific are imposing stricter requirements for data privacy, access control, and auditability. Secretless Database Connectivity solutions provide comprehensive logging, access policies, and real-time monitoring capabilities that help organizations demonstrate compliance and avoid costly penalties. As enterprises face mounting pressure to safeguard customer data and intellectual property, the adoption of secretless approaches is becoming a strategic imperative, further accelerating market expansion.

From a regional perspective, North America currently dominates the Secretless Database Connectivity market, accounting for the largest revenue share in 2024 due to the presence of leading technology vendors, early adoption of advanced cybersecurity solutions, and a highly regulated business environment. However, the Asia Pacific region is expected to exhibit the fastest growth rate over the forecast period, driven by rapid digitalization, increasing cloud adoption, and rising awareness of data security best practices among enterprises and government agencies. Europe also represents a significant market, underpinned by stringent data protection regulations and a mature IT ecosystem. The interplay of these regional dynamics is shaping the global competitive landscape and creating new opportunities for vendors and service providers.

Component Analysis

The Secretless Database Connectivity market by component is segmented into software, hardware, and services. The softw
e
Individuals who have used the Internet in the last 12 months and have a...
data.europa.eu
html, unknown
Updated Jun 11, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
VLADA REPUBLIKE SLOVENIJE STATISTIČNI URAD REPUBLIKE SLOVENIJE (2024). Individuals who have used the Internet in the last 12 months and have a digital certificate or a one-time smsPASS password generator and reasons why they do not have one, by age groups and sex, Slovenia, 2019 [Dataset]. https://data.europa.eu/data/datasets/surs2982325s?locale=en
Explore at:
html, unknownAvailable download formats
Dataset updated
Jun 11, 2024
Dataset authored and provided by
VLADA REPUBLIKE SLOVENIJE STATISTIČNI URAD REPUBLIKE SLOVENIJE
Area covered
Slovenia
Description
This database automatically captures metadata, the source of which is the GOVERNMENT OF THE REPUBLIC OF SLOVENIA STATISTICAL USE OF THE REPUBLIC OF SLOVENIA and corresponding to the source database entitled “Individuals who have used the Internet in the last 12 months and have a digital certificate or certificate or one-time password generator smsPASS and the reasons why they do not have them, by age class and sex, Slovenia, 2019”.

Actual data are available in Px-Axis format (.px). With additional links, you can access the source portal page for viewing and selecting data, as well as the PX-Win program, which can be downloaded free of charge. Both allow you to select data for display, change the format of the printout, and store it in different formats, as well as view and print tables of unlimited size, as well as some basic statistical analyses and graphics.

Data from: SQL Injection Attack Netflow

zenodo.org
portalcientifico.unileon.es
+3more

Updated Sep 28, 2022

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Ignacio Crespo; Ignacio Crespo; Adrián Campazas; Adrián Campazas (2022). SQL Injection Attack Netflow [Dataset]. http://doi.org/10.5281/zenodo.6907252

Explore at:

Unique identifier

https://doi.org/10.5281/zenodo.6907252

Dataset updated

Sep 28, 2022

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Ignacio Crespo; Ignacio Crespo; Adrián Campazas; Adrián Campazas

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Introduction

This datasets have SQL injection attacks (SLQIA) as malicious Netflow data. The attacks carried out are SQL injection for Union Query and Blind SQL injection. To perform the attacks, the SQLMAP tool has been used.

NetFlow traffic has generated using DOROTHEA (DOcker-based fRamework fOr gaTHering nEtflow trAffic). NetFlow is a network protocol developed by Cisco for the collection and monitoring of network traffic flow data generated. A flow is defined as a unidirectional sequence of packets with some common properties that pass through a network device.

Datasets

The firts dataset was colleted to train the detection models (D1) and other collected using different attacks than those used in training to test the models and ensure their generalization (D2).

The datasets contain both benign and malicious traffic. All collected datasets are balanced.

The version of NetFlow used to build the datasets is 5.

Dataset	Aim	Samples	Benign-malicious traffic ratio
D1	Training	400,003	50%
D2	Test	57,239	50%

Infrastructure and implementation

Two sets of flow data were collected with DOROTHEA. DOROTHEA is a Docker-based framework for NetFlow data collection. It allows you to build interconnected virtual networks to generate and collect flow data using the NetFlow protocol. In DOROTHEA, network traffic packets are sent to a NetFlow generator that has a sensor ipt_netflow installed. The sensor consists of a module for the Linux kernel using Iptables, which processes the packets and converts them to NetFlow flows.

DOROTHEA is configured to use Netflow V5 and export the flow after it is inactive for 15 seconds or after the flow is active for 1800 seconds (30 minutes)

Benign traffic generation nodes simulate network traffic generated by real users, performing tasks such as searching in web browsers, sending emails, or establishing Secure Shell (SSH) connections. Such tasks run as Python scripts. Users may customize them or even incorporate their own. The network traffic is managed by a gateway that performs two main tasks. On the one hand, it routes packets to the Internet. On the other hand, it sends it to a NetFlow data generation node (this process is carried out similarly to packets received from the Internet).

The malicious traffic collected (SQLI attacks) was performed using SQLMAP. SQLMAP is a penetration tool used to automate the process of detecting and exploiting SQL injection vulnerabilities.

The attacks were executed on 16 nodes and launch SQLMAP with the parameters of the following table.

Parameters	Description
'--banner','--current-user','--current-db','--hostname','--is-dba','--users','--passwords','--privileges','--roles','--dbs','--tables','--columns','--schema','--count','--dump','--comments', --schema'	Enumerate users, password hashes, privileges, roles, databases, tables and columns
--level=5	Increase the probability of a false positive identification
--risk=3	Increase the probability of extracting data
--random-agent	Select the User-Agent randomly
--batch	Never ask for user input, use the default behavior
--answers="follow=Y"	Predefined answers to yes

Every node executed SQLIA on 200 victim nodes. The victim nodes had deployed a web form vulnerable to Union-type injection attacks, which was connected to the MYSQL or SQLServer database engines (50% of the victim nodes deployed MySQL and the other 50% deployed SQLServer).

The web service was accessible from ports 443 and 80, which are the ports typically used to deploy web services. The IP address space was 182.168.1.1/24 for the benign and malicious traffic-generating nodes. For victim nodes, the address space was 126.52.30.0/24.
The malicious traffic in the test sets was collected under different conditions. For D1, SQLIA was performed using Union attacks on the MySQL and SQLServer databases.

However, for D2, BlindSQL SQLIAs were performed against the web form connected to a PostgreSQL database. The IP address spaces of the networks were also different from those of D1. In D2, the IP address space was 152.148.48.1/24 for benign and malicious traffic generating nodes and 140.30.20.1/24 for victim nodes.

To run the MySQL server we ran MariaDB version 10.4.12.
Microsoft SQL Server 2017 Express and PostgreSQL version 13 were used.

❗RockYou2024.txt| Original zip file
kaggle.com
zip
Updated Jul 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BwandoWando (2024). ❗RockYou2024.txt| Original zip file [Dataset]. https://www.kaggle.com/datasets/bwandowando/original-rockyou2024-text-file-11-parts
Explore at:
zip(45855445391 bytes)Available download formats
Dataset updated
Jul 10, 2024
Authors
BwandoWando
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Image

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1842206%2Fd4dd9853c2214e89f179cfb72f85be9b%2Fhacker2.png?generation=1720601229197012&alt=media" alt="">

Kaggle Previous Version of RockYou.txt

https://www.googleapis.com/download/storage/v1/b/kaggle-forum-message-attachments/o/inbox%2F1842206%2F0e4b20e3662c065318f7feefb42ef785%2Foriginal.png?generation=1720578063663708&alt=media" alt="">

The original RockYou.txt dataset was uploaded by @wjburns 5 years ago, with 95K downloads and 640 upvotes, which means Kaggle allows this type of data for research and educational purposes.

Files

This is the original RockYou2024.txt file just Zipped and spliced into 11 parts.

History

https://en.wikipedia.org/wiki/RockYou

Everyone involved with Capture The Flag (CTF) has used the infamous rockyou.txt wordlist at least once, mainly to perform password cracking activities. The file is a list of 14 million unique passwords originating from the 2009 RockYou hack making a piece of computer security history. The “rockyou lineage” has evolved over the years.

https://www.youtube.com/watch?v=0_mQACSn6XM" alt="">

RockYou2024.txt

With the 2021 version we touched high numbers but with the newest release is the (apparently) ultimate amalgamation. RockYou2024 has been released by the user “ObamaCare” . This new version added 1.5 billion of records to the 2021 version reaching the 10 billions records. A wordlist can potentially be used for a multitude of tasks and having this number of records in a single file, especially in 2024 with increasingly aggressive data breaches, is a dream come true for attackers. The user have not specified the nature of the additional records but punctualize the new data comes from recent leaked databases.

From The New RockYou2024 Collection has been published!

Source

I got it from https://github.com/hkphh/rockyou2024.txt, but it was originally shared by a certain aka ObamaCare which I don't have any affiliation nor association with.

Cover Image

Generated with Bing Image Generator
Data from: Malware Finances and Operations: a Data-Driven Study of the Value...
data.niaid.nih.gov
zenodo.org
+1more
Updated Jun 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nurmi, Juha; Niemelä, Mikko; Brumley, Billy (2023). Malware Finances and Operations: a Data-Driven Study of the Value Chain for Infections and Compromised Access [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8047204
Explore at:
Dataset updated
Jun 20, 2023
Dataset provided by
Cyber Intelligence Househttps://cyberintelligencehouse.com/
Tampere University
Authors
Nurmi, Juha; Niemelä, Mikko; Brumley, Billy
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Description

The datasets demonstrate the malware economy and the value chain published in our paper, Malware Finances and Operations: a Data-Driven Study of the Value Chain for Infections and Compromised Access, at the 12th International Workshop on Cyber Crime (IWCC 2023), part of the ARES Conference, published by the International Conference Proceedings Series of the ACM ICPS.

Using the well-documented scripts, it is straightforward to reproduce our findings. It takes an estimated 1 hour of human time and 3 hours of computing time to duplicate our key findings from MalwareInfectionSet; around one hour with VictimAccessSet; and minutes to replicate the price calculations using AccountAccessSet. See the included README.md files and Python scripts.

We choose to represent each victim by a single JavaScript Object Notation (JSON) data file. Data sources provide sets of victim JSON data files from which we've extracted the essential information and omitted Personally Identifiable Information (PII). We collected, curated, and modelled three datasets, which we publish under the Creative Commons Attribution 4.0 International License.

MalwareInfectionSet We discover (and, to the best of our knowledge, document scientifically for the first time) that malware networks appear to dump their data collections online. We collected these infostealer malware logs available for free. We utilise 245 malware log dumps from 2019 and 2020 originating from 14 malware networks. The dataset contains 1.8 million victim files, with a dataset size of 15 GB.

VictimAccessSet We demonstrate how Infostealer malware networks sell access to infected victims. Genesis Market focuses on user-friendliness and continuous supply of compromised data. Marketplace listings include everything necessary to gain access to the victim's online accounts, including passwords and usernames, but also detailed collection of information which provides a clone of the victim's browser session. Indeed, Genesis Market simplifies the import of compromised victim authentication data into a web browser session. We measure the prices on Genesis Market and how compromised device prices are determined. We crawled the website between April 2019 and May 2022, collecting the web pages offering the resources for sale. The dataset contains 0.5 million victim files, with a dataset size of 3.5 GB.

AccountAccessSet The Database marketplace operates inside the anonymous Tor network. Vendors offer their goods for sale, and customers can purchase them with Bitcoins. The marketplace sells online accounts, such as PayPal and Spotify, as well as private datasets, such as driver's licence photographs and tax forms. We then collect data from Database Market, where vendors sell online credentials, and investigate similarly. To build our dataset, we crawled the website between November 2021 and June 2022, collecting the web pages offering the credentials for sale. The dataset contains 33,896 victim files, with a dataset size of 400 MB.

Credits Authors

Billy Bob Brumley (Tampere University, Tampere, Finland)

Juha Nurmi (Tampere University, Tampere, Finland)

Mikko Niemelä (Cyber Intelligence House, Singapore)

Funding

This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme under project numbers 804476 (SCARE) and 952622 (SPIRS).

Alternative links to download: AccountAccessSet, MalwareInfectionSet, and VictimAccessSet.
List of Top Data Breaches (2004 - 2021)
kaggle.com
zip
Updated Sep 3, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hishaam Armghan (2021). List of Top Data Breaches (2004 - 2021) [Dataset]. https://www.kaggle.com/datasets/hishaamarmghan/list-of-top-data-breaches-2004-2021
Explore at:
zip(5975 bytes)Available download formats
Dataset updated
Sep 3, 2021
Authors
Hishaam Armghan
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This is a dataset containing all the major data breaches in the world from 2004 to 2021

As we know, there is a big issue related to the privacy of our data. Many major companies in the world still to this day face this issue every single day. Even with a great team of people working on their security, many still suffer. In order to tackle this situation, it is only right that we must study this issue in great depth and therefore I pulled this data from Wikipedia to conduct data analysis. I would encourage others to take a look at this as well and find as many insights as possible.

This data contains 5 columns: 1. Entity: The name of the company, organization or institute 2. Year: In what year did the data breach took place 3. Records: How many records were compromised (can include information like email, passwords etc.) 4. Organization type: Which sector does the organization belong to 5. Method: Was it hacked? Were the files lost? Was it an inside job?

Here is the source for the dataset: https://en.wikipedia.org/wiki/List_of_data_breaches

Here is the GitHub link for a guide on how it was scraped: https://github.com/hishaamarmghan/Data-Breaches-Scraping-Cleaning
m
Dataset of 7632 images of 53 Devanagari Alphabet Images Across 144 Spatial...
data.mendeley.com
Updated May 13, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SANJAY PATE (2025). Dataset of 7632 images of 53 Devanagari Alphabet Images Across 144 Spatial Positions in 5×5 Grids with 2×2 Sub grid Localization for Grid based Graphical Password Authentication. [Dataset]. http://doi.org/10.17632/4x2jtpmtvg.1
Explore at:
Unique identifier
https://doi.org/10.17632/4x2jtpmtvg.1
Dataset updated
May 13, 2025
Authors
SANJAY PATE
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset is a structured image dataset designed to facilitate research in spatial localization, pattern recognition, and character classification. It contains high-resolution images of 53 distinct alphabet characters, each systematically placed within a standardized 5×5 grid layout. Each 5×5 grid consists of 25 individual cells. Within each grid, we define 16 overlapping 2×2 sub-grids. These sub-grids serve as local regions of interest for fine-grained spatial analysis. In each 2×2 sub-grid, there are 9 specific positional locations where an alphabet image can be placed—cantered within or slightly offset relative to the subgrid to provide a range of spatial variation. This results in a total of 144 unique placement positions for each character across the entire 5×5 grid. For every alphabet character, the dataset includes an image placed in each of these 144 locations, leading to a comprehensive total of 7,632 labeled samples (53 characters × 144 positions). All samples are consistent in size and format, and the position of each character is precisely annotated to facilitate supervised learning tasks. The Devanagari 53 Alphabet dataset is ideal for training and evaluating models on tasks such as character localization, grid-based graphical password , and few-shot learning under positional variation. The structured spatial layout and extensive position coverage also make it suitable for research in visual attention models, object detection benchmarks, and spatially-aware neural architectures.
Z
Identity as Service Market By Component Type (Provisioning, Directory...
zionmarketresearch.com
pdf
Updated Nov 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zion Market Research (2025). Identity as Service Market By Component Type (Provisioning, Directory services, Password management , Single sign-on , Advanced authentication and Others) , By End Use (BFSI, IT & Telecom, Public ,Healthcare, Retail, Education, and Manufacturing): Global Industry Perspective, Comprehensive Analysis, and Forecast, 2024 - 2032 [Dataset]. https://www.zionmarketresearch.com/report/identity-as-a-service-market
Explore at:
pdfAvailable download formats
Dataset updated
Nov 22, 2025
Dataset authored and provided by
Zion Market Research
License
https://www.zionmarketresearch.com/privacy-policyhttps://www.zionmarketresearch.com/privacy-policy
Time period covered
2022 - 2030
Area covered
Global
Description
Identity as Service Market size is set to expand from $ 6.53 Billion in 2023 to $ 57.73 Billion by 2032, with CAGR of around 27.4% from 2024 to 2032.
w
Training.gov.au - Web service access to sandbox environment
data.wu.ac.at
researchdata.edu.au
Updated Apr 4, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Education and Training (2018). Training.gov.au - Web service access to sandbox environment [Dataset]. https://data.wu.ac.at/schema/data_gov_au/NjcxNmY0NzgtZjYxNi00ZjVkLTkyOGQtZTc2YjE1Mzg3ZGM0
Explore at:
Dataset updated
Apr 4, 2018
Dataset provided by
Department of Education and Training
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Area covered
67e25f9b8bb1f25f4d0cc2401a81249522675030
Description
Introduction

Training.gov.au (TGA) is the National Register of Vocational Education and Training in Australia and contains authoritative information about Registered Training Organisations (RTOs), Nationally Recognised Training (NRT) and the approved scope of each RTO to deliver NRT as required in national and jurisdictional legislation.

TGA web-services overview

TGA has a web service available to allow external systems to access and utilise information stored in TGA through an external system. The TGA web service is exposed through a single interface and web service users are assigned a data reader role which will apply to all data stored in the TGA.

The web service can be broadly split into three categories:

RTOs and other organisation types;

Training components including Accredited courses, Accredited course Modules Training Packages, Qualifications, Skill Sets and Units of Competency;

System metadata including static data and statistical classifications.

Users will gain access to the TGA web service by first passing a user name and password through to the web server. The web server will then authenticate the user against the TGA security provider before passing the request to the application that supplies the web services.

There are two web services environments:

1. Production - ws.training.gov.au – National Register production web services

2. Sandbox - ws.sandbox.training.gov.au – National Register sandbox web services.

The National Register sandbox web service is used to test against the current version of the web services where the functionality will be identical to the current production release. The web service definition and schema of the National Register sandbox database will also be identical to that of production release at any given point in time. The National Register sandbox database will be cleared down at regular intervals and realigned with the National Register production environment.

Each environment has three configured services:

Organisation Service;

Training Component Service; and

Classification Service.

Sandbox environment access

To access the download area for web services, navigate to http://tga.hsd.com.au and use the below name and password:

Username: WebService.Read (case sensitive)

Password: Asdf098 (case sensitive)

This download area contains various versions of the following artefacts that you may find useful

• Training.gov.au web service specification document;

• Training.gov.au logical data model and definitions document;

• .NET web service SDK sample app (with source code);

• Java sample client (with source code);

• How to setup web service client in VS 2010 video; and

• Web services WSDL's and XSD's.

For the business areas, the specification/definition documents and the sample application is a good place to start while the IT areas will find the sample source code and the video useful to start developing against the TGA web services.

The web services Sandbox end point is: https://ws.sandbox.training.gov.au/Deewr.Tga.Webservices

Production web service access

Once you are ready to access the production web service, please email the TGA team at tgaproject@education.gov.au to obtain a unique user name and password.
Z
Cloud IAM Market By components (password administration and audit, user...
zionmarketresearch.com
pdf
Updated Nov 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zion Market Research (2025). Cloud IAM Market By components (password administration and audit, user provisioning, access administration, directory services, single sign-on, authority and compliance administration), by end-user (instance, small and medium businesses (SMBs) and ventures), By verticals (telecom and IT, BFSI, energy, public sector and utilities, oil and gas, healthcare, edification, manufacturing, and retail) And By Region: - Global And Regional Industry Overview, Market Intelligence, Comprehensive Analysis, Historical Data, And Forecasts, 2023-2030 [Dataset]. https://www.zionmarketresearch.com/report/cloud-iam-market
Explore at:
pdfAvailable download formats
Dataset updated
Nov 22, 2025
Dataset authored and provided by
Zion Market Research
License
https://www.zionmarketresearch.com/privacy-policyhttps://www.zionmarketresearch.com/privacy-policy
Time period covered
2022 - 2030
Area covered
Global
Description
Cloud IAM Market was valued at $5.59 B in 2023, and is projected to reach $USD 25.31 B by 2032, at a CAGR of 18.26% from 2023 to 2032.

Facebook

Twitter

Click to copy link

Link copied

Cite

Utkarsh Singh (2023). Password Strength and Vulnerability Dataset [Dataset]. https://www.kaggle.com/datasets/utkarshx27/passwords

Password Strength and Vulnerability Dataset

Password Popularity, Crack Times, and Strength Ratings

Explore at:

2 scholarly articles cite this dataset (View in Google Scholar)

zip(6094 bytes)Available download formats

Dataset updated

Jul 31, 2023

Authors

Utkarsh Singh

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Data is sourced from Information is Beautiful, with the graphic coming from the same group here.

Wikipedia has a nice article on password strength as well.

Data Dictionary

`passwords.csv`

variable	class	description
rank	double	popularity in their database of released passwords
password	character	Actual text of the password
category	character	What category does the password fall in to?
value	double	Time to crack by online guessing
time_unit	character	Time unit to match with value
offline_crack_sec	double	Time to crack offline in seconds
rank_alt	double	Rank 2
strength	double	Strength = quality of password where 10 is highest, 1 is lowest, please note that these are relative to these generally bad passwords
font_size	double	Used to create the graphic for KIB

Clear search

Close search

Google apps

Main menu

Password Strength and Vulnerability Dataset

Data Dictionary

passwords.csv

CrackStation's Password Cracking Dictionary

Bruteforce Database - Password dictionaries

Ultimate Cybersecurity Password & Username Dataset

❗RockYou2024.txt| 10B Common Passwords List

Image

Kaggle Previous Version of RockYou.txt

Files

Note

History

RockYou2024.txt

Source

Original TxtFile

Strong Passwords Only

Cover Image

Charset distribution of leaked user passwords worldwide 2017

Password Management Market Report

❗RockYou2024.txt| 180 Million "Strong Passwords"

Image

Description

Note

Kaggle Previous Version of RockYou.txt

Note

History

RockYou2024.txt

Source

Variations of Dataset

Reminder

Cover Image

Dark Web Gmail Credentials Database

Leaked passwords of the VimeWorld minecraft server

About data 🎈

The structure parquet file 🟡

Secretless Database Connectivity Market Research Report 2033

Secretless Database Connectivity Market Outlook

Component Analysis

Individuals who have used the Internet in the last 12 months and have a...

Data from: SQL Injection Attack Netflow

❗RockYou2024.txt| Original zip file

Image

Kaggle Previous Version of RockYou.txt

Files

History

RockYou2024.txt

Source

Cover Image

Data from: Malware Finances and Operations: a Data-Driven Study of the Value...

List of Top Data Breaches (2004 - 2021)

Dataset of 7632 images of 53 Devanagari Alphabet Images Across 144 Spatial...

Identity as Service Market By Component Type (Provisioning, Directory...

Training.gov.au - Web service access to sandbox environment

Introduction

TGA web-services overview

Sandbox environment access

Production web service access

Cloud IAM Market By components (password administration and audit, user...

Password Strength and Vulnerability Dataset

Password Popularity, Crack Times, and Strength Ratings

Data Dictionary

passwords.csv

`passwords.csv`

`passwords.csv`